CN109429078A - Method for processing video frequency and device, for the device of video processing - Google Patents

Method for processing video frequency and device, for the device of video processing Download PDF

Info

Publication number
CN109429078A
CN109429078A CN201710737274.2A CN201710737274A CN109429078A CN 109429078 A CN109429078 A CN 109429078A CN 201710737274 A CN201710737274 A CN 201710737274A CN 109429078 A CN109429078 A CN 109429078A
Authority
CN
China
Prior art keywords
target
video frame
video
information
article
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710737274.2A
Other languages
Chinese (zh)
Other versions
CN109429078B (en
Inventor
张�杰
卜海亮
靳笑
靳一笑
邢真臻
蒋品
冯新强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201710737274.2A priority Critical patent/CN109429078B/en
Publication of CN109429078A publication Critical patent/CN109429078A/en
Application granted granted Critical
Publication of CN109429078B publication Critical patent/CN109429078B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a kind of method for processing video frequency and device, a kind of device for video processing, method therein is specifically included: image recognition is carried out to the video frame that video includes, to obtain the corresponding image recognition result of the video frame;The target item to match with described image recognition result is obtained from pre- placing articles library;By the corresponding target information addition of the target item in the video frame.The embodiment of the present invention can shorten the processing time of video and promote video treatment effeciency, and the video coverage rate of target information can be improved.

Description

Method for processing video frequency and device, for the device of video processing
Technical field
The present invention relates to video technique fields, are used for video more particularly to a kind of method for processing video frequency and device, one kind The device of processing.
Background technique
With the development of internet technology, more and more users' habit watches video, tool by terminals such as computer, mobile phones Body, user can watch interested view by the player being implanted on the player or webpage of locally-installed client Frequently.
Information is added in video currently, can handle by video.Existing scheme can be by manual operation in video Middle addition information, specifically, operator extract the video for being suitble to addition information after watching video from video first Then frame obtains the corresponding information of the video frame, be inserted into acquired information in the video frame followed by editing system.
However, existing scheme adds information by manual operation in video, need to spend more time cost and people It is low to will lead to video treatment effeciency in this way for power cost.
Summary of the invention
In view of the above problems, it proposes the embodiment of the present invention and overcomes the above problem or at least partly in order to provide one kind The method for processing video frequency that solves the above problems, video process apparatus and the device for video processing, the embodiment of the present invention can be with Shorten the processing time of video and promote video treatment effeciency, and the video coverage rate of target information can be improved.
To solve the above-mentioned problems, the invention discloses a kind of method for processing video frequency, comprising:
Image recognition is carried out to the video frame that video includes, to obtain the corresponding image recognition result of the video frame;
The target item to match with described image recognition result is obtained from pre- placing articles library;
By the corresponding target information addition of the target item in the video frame.
On the other hand, the invention discloses a kind of video process apparatus, comprising:
Picture recognition module, the video frame for including to video carry out image recognition, corresponding to obtain the video frame Image recognition result;
Target item obtains module, for obtaining the target to match with described image recognition result from pre- placing articles library Article;And
Target information adding module, for adding the corresponding target information of the target item in the video frame.
Optionally, the target item acquisition module includes:
Judging submodule, for judge in described image recognition result whether include and the first object in the pre- placing articles library The second same, similar or generic article of condition, if so, using first article as with described image recognition result phase Matched target item.
Optionally, the judging submodule includes:
Matching unit, the characteristic information of the second article for including by described image recognition result and the pre- placing articles The characteristic information of the first article is matched in library, to obtain corresponding matching result;
Target item determination unit, if being successful match for the matching result, it is determined that described image recognition result In include the target item identical, similar or generic as the first article in the pre- placing articles library;
Wherein, the characteristic information includes: at least one of shape, color and classification.
Optionally, the target information adding module includes:
Target position determines submodule, for determining in the video frame for adding the target position of target information;
Submodule is added, adds the target information for the target position in the video frame.
Optionally, the target position determines that submodule includes:
First object position determination unit, for determining between the existing article of the video frame and the target item Degree of conformity;The position that degree of conformity meets the article of prerequisite is obtained from the existing article of the video frame, as target position It sets;And/or
Second target position determination unit is suitable for adding the preset figure of the target information for identification out in video frame As target area, using the prediction picture target area as the target position.
Optionally, the target position is subtitle relevant position;
The addition submodule includes:
Subtitle modifies unit, for modifying according to target information to the subtitle for including in the video frame, in institute It states and adds the target information in the subtitle that video frame includes;And/or
Subtitle adds subelement, for adding target information as the additional information of subtitle in the video frame described Around subtitle, to add the target information in the video frame.
Optionally, the target information adding module includes:
Video frame information modifies submodule, for according to the target information, to corresponding to target position in the video frame Information modify, to obtain the modified video frame including the target information;Or
Additional submodule, for adding the target information as the additional information for corresponding to target position in the video frame The video frame is added.
Optionally, the video frame information modification submodule includes:
Pixel value modifies unit, for the first pixel value for corresponding to target position in the video frame to be replaced with target letter Cease corresponding second pixel value, target information and/or text of corresponding second pixel value of the target information according to picture format The color-values of the target information of this format determine;And/or
Text modification unit will correspond to word for modifying to the text information for corresponding to subtitle position in video frame The text information of curtain position is revised as the target information of text formatting.
Optionally, described device further include:
Picture charge pattern module, the image object in the successive video frames for including to the video carry out image trace;
Target information Multiplexing module, the image object for being directed in subsequent video frame according to image trace result are multiple With the corresponding target information of identical image target in video frame before.
In another aspect, the invention discloses a kind of device for video processing, include memory and one or More than one program, perhaps more than one program is stored in memory and is configured to by one or one for one of them It includes the instruction for performing the following operation that a above processor, which executes the one or more programs:
Image recognition is carried out to the video frame that video includes, to obtain the corresponding image recognition result of the video frame;
The target item to match with described image recognition result is obtained from pre- placing articles library;
By the corresponding target information addition of the target item in the video frame.
Another aspect, the invention discloses a kind of machine readable medias, are stored thereon with instruction, when by one or more When managing device execution, so that device executes method for processing video frequency described in aforementioned one or more.
The embodiment of the present invention includes following advantages:
The embodiment of the present invention by the information in machine automatic identification video frame, obtain in pre- placing articles library with image recognition As a result the target item to match, and by the corresponding target information addition of the target item in corresponding video frame;Due to The embodiment of the present invention image recognition result of quick obtaining and video frame can match in the case where being not necessarily to manual intervention Target item, therefore the processing time of video can be shortened and promote video treatment effeciency.
Also, in the case where being shortened the video processing time, manageable number of videos can go out in the unit time The growth of existing geometry rank, and the machine scale of video can be handled come infinite expanding by way of computing cluster, in this way, The video coverage rate of target information can be improved.
Further, the embodiment of the present invention carries out video processing by the way of image recognition and pre- placing articles storehouse matching, this Sample in the case that the information in the pre- placing articles library changes, can obtain newest mesh based on pre- placing articles storehouse matching Article and its corresponding target information are marked, therefore the timeliness for the target information added in the video frame can be improved, or even can To realize the real-time update of target information to a certain extent.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of method for processing video frequency embodiment one of the invention;
Fig. 2 is a kind of step flow chart of method for processing video frequency embodiment two of the invention;
Fig. 3 is a kind of structural block diagram of video process apparatus embodiment of the invention;
Fig. 4 be a kind of device 900 for video processing of the invention as terminal when structural block diagram;And
Fig. 5 is the structural schematic diagram of server in some embodiments of the present invention.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
The embodiment of the invention provides a kind of video processing schemes, the program can carry out figure to the video frame that video includes As identification, to obtain the corresponding image recognition result of the video frame;And it obtains from pre- placing articles library and is identified with described image As a result the target item to match;And then believe the corresponding information of the target item as the corresponding target of the video frame Breath adds in the video frame.
The embodiment of the present invention by the information in machine automatic identification video frame, obtain in pre- placing articles library with image recognition As a result the target item to match, and by the corresponding target information addition of the target item into video frame;Due to this hair Bright embodiment can in the case where being not necessarily to manual intervention quick obtaining and video frame the target that matches of image recognition result Article, therefore the processing time of video can be shortened and promote video treatment effeciency.
Also, in the case where being shortened the video processing time, manageable number of videos can go out in the unit time The growth of existing geometry rank, and the machine scale of video can be handled come infinite expanding by way of computing cluster, in this way, It can be further improved video treatment effeciency.
Further, the embodiment of the present invention carries out video processing by the way of image recognition and pre- placing articles storehouse matching, this Sample in the case that the information in the pre- placing articles library changes, can obtain newest mesh based on pre- placing articles storehouse matching Article and its corresponding target information are marked, therefore the update cycle of target information can be shortened, such as to a certain extent can be with Realize the real-time update of target information.
Video processing schemes provided in an embodiment of the present invention can be handled for the video from any video platform, And video processing schemes provided in an embodiment of the present invention can play video to offline video or in real time and handle.Wherein, Video platform can be for for providing the network platform of video, in practical applications, the example of video platform may include: video Website and/or video APP (application program, Application) etc..
Referring to Fig.1, a kind of exemplary block diagram of processing system for video of the embodiment of the present invention is shown, which can be with It include: video server 101, videoconference client 102 and video process apparatus 103;Wherein, video server 101 and video visitor Family end 102 can be located in wired or wireless network, by the wired or wireless network, video server 101 and video consumer End 102 carries out data interaction;Video server 101 can also be counted with video process apparatus 103 by wired or wireless network According to interaction.
In practical applications, video server 101 can provide the first video to videoconference client 102, so that video is objective The first video that family end 102 provides video server 101 plays out;For example, can be according to the broadcasting of videoconference client 102 Request or downloading request, provide corresponding first video to videoconference client 102.
Also, video server 101 can provide the second video for needing to add information to video process apparatus 103, then The video processing schemes that video process apparatus 103 can use the embodiment of the present invention handle the second video, to be added Added with the second video of target information, and the second video for being added with target information is sent to video server 101.
In practical applications, the second video can play for offline video or in real time video;Wherein, it is in the second video In the case where offline video, the second video can be current popular video etc., and video server 101 can be filled to video processing 103 transmission offline videos are set, the offline video for being added with target information are obtained from video process apparatus 103, and to added with mesh Second video of mark information is stored, in this way, asking in the playing request or downloading for receiving the transmission of videoconference client 102 It asks, then it can be with to the first video that videoconference client 102 provides are as follows: playing request or downloading request is corresponding is added with target Second video of information is stored.
In the case where the second video is to play video in real time, video server 101 can receive the hair of videoconference client 102 The playing request sent, for example, can be carried in the playing request in real time play video URL (uniform resource locator, Uniform Resource Locator) etc. information, then can according to the URL obtain in real time play video, and to video handle Device 103 is sent plays video in real time, the real-time broadcasting video for being added with target information is obtained from video process apparatus 103, then The first video provided to videoconference client 102 can be with are as follows: the real-time broadcasting video added with target information.
It is appreciated that processing system for video shown in Fig. 1 is intended only as the application of the method for processing video frequency of the embodiment of the present invention The example of environment, it will be understood that the method for processing video frequency of the embodiment of the present invention can be applied in arbitrary application environment, example Such as, the method for processing video frequency of the embodiment of the present invention can also be applied in the application environment of client, wherein videoconference client 102 can use the method for processing video frequency of the embodiment of the present invention, and the first video provided video server 101 is handled, To add target information etc. in the first video, the embodiment of the present invention is without restriction for specific application environment.
Embodiment of the method
Referring to Fig. 2, a kind of step flow chart of method for processing video frequency embodiment of the invention is shown, can specifically include Following steps:
Step 201 carries out image recognition to the video frame that video includes, to obtain the corresponding image recognition of the video frame As a result;
Step 202 obtains the target item to match with described image recognition result from pre- placing articles library;
Step 203 adds the corresponding target information of the target item in the video frame.
The embodiment of the present invention is without restriction for the source of video in step 201.For example, the video can be originated from video Server may originate from user.Wherein, in the case where the video source is from video server, which can be offline view Frequency plays video in real time.In the case where the video source is from user, for example, can by way of website or APP to User, which provides, uploads interface, and the video that user is uploaded by the upload interface is as video in step 201.
In practical applications, several video frames, the view extracted can be extracted from video according to preset time interval Frequency frame can be used as the object of image recognition.It is appreciated that those skilled in the art can be according to practical application request, in determination Preset time interval is stated, for example, above-mentioned preset time interval can be the corresponding playing duration of N number of video frame, N is positive integer, It is appreciated that the embodiment of the present invention is without restriction for specific N and preset time interval.
Image recognition refers to and is handled image, analyzed and understood using machine, to identify the figure of various different modes As the technology of target.Specific to the embodiment of the present invention, it can use machine and video frame handled, analyzed and is understood, to know The technology of the image object of not various different modes, wherein the image object in usual video frame can correspond in the video frame There is certain image-region, the image object in video frame may include: article, personage, space etc., for example, personage can be Personage in video frame, article can be the article of personage's wearing in video frame, and space can be ring locating for personage in video frame Border space, such as outdoor environment, indoor environment can be with for example, indoor environment may include the information such as indoor wall, ground Understand, the embodiment of the present invention is without restriction for the specific image object in video frame.
In an alternative embodiment of the invention, the process for carrying out image recognition to the video frame that video includes can wrap It includes: the image object in detection video frame, and the image object got is analyzed using deep learning method, to obtain Corresponding image object information, therefore, the image recognition result of the embodiment of the present invention may include: the corresponding image mesh of video frame Mark information.Above-mentioned image object information may include: image object image (namely the image of image object in the video frame, The image object is usually corresponding with certain closed area in the video frame), the image recognition result of image object (such as identifies The information such as title, the classification of the image object arrived).For example, can use the face in human face detection tech detection video frame, and Face is analyzed using deep learning method, with information such as gender, ages for obtaining personage, or even personage can also be obtained Source, such as which movie and television play be originated from, or even can also obtain which famous person personage is.Further, the people can also be detected The article of object wearing, such as clothes, shoes, the wrist-watch of wearing, jewellery.Alternatively, the letter of space locating for the personage can also be detected Breath etc..
After step 201 obtains the corresponding image recognition result of the video frame, step 202 can be from pre- placing articles library The target item that middle acquisition matches with described image recognition result.
Wherein, pre- placing articles library can be used for storing the first article, also, first article can also be corresponding with characteristic information And target information.In practical applications, it can cooperate with operator, to obtain the first article and its corresponding characteristic information And target information.
Wherein, the characteristic information of the first article is used to characterize the article characteristics of the first article, can be used as and text envelope Breath carries out matched matching foundation.
Target information is the information for adding in the video frame;For example, target information can for the first article logo, Picture etc. attracts the information of user, and for another example, target information can be the access entrances such as link, so that user passes through the access entrance Into the corresponding page of the first article.
The example of first article may include: the commodity such as clothes, shoes, beverage, adornment, and target information may include: Target information and/or the target information of text formatting of the picture formats such as logo, display diagram, poster etc., it will be understood that operator It can determine that the first article recommended and its corresponding target information, the present invention are implemented according to practical application request Example is without restriction for specific first article and its corresponding target information.
Additionally, it is appreciated that providing the first article and its corresponding characteristic information and target information above by operator Mode be intended only as alternative embodiment, in fact, those skilled in the art can be according to practical application request, using its other party Formula obtains the first article and its corresponding characteristic information and target information, for example, according to the historical behavior data acquisition of user the One article etc. specifically can be according to the feature of interest of the historical behavior data acquisition user of user, and it is emerging to obtain the sense Corresponding first article of interesting feature, for example, the feature of interest can be the product features that user bought, which can Think similar another characteristic of the product features etc., it will be understood that the embodiment of the present invention is for the first article and its corresponding target The specific acquisition modes of information are without restriction.
In an alternative embodiment of the invention, above-mentioned steps 202 are obtained from pre- placing articles library knows with described image The process for the target item that other result matches may include: judge in described image recognition result whether include with it is described preset The second identical, similar or generic article of first article in article library, if so, using first article as with it is described The target item that image recognition result matches.The embodiment of the present invention can by with the second article for including in image recognition result As target item, therefore the video coverage rate of target information can be improved in the first identical or generic article.For example, " cap 1 " for including in image recognition result and " cap 2 " that includes in pre- placing articles library are identical;For another example, image recognition result In include " Western-style clothes 1 " and pre- placing articles library in include " Western-style clothes 2 " it is similar;For another example, the article for including in pre- placing articles library is " cola ", article is " Sprite " in image recognition result, and classification belonging to " cola " and " Sprite " is the drink of pop can shape Material etc..
In another alternative embodiment of the invention, it is above-mentioned judge in described image recognition result whether include with it is described The first article is identical in pre- placing articles library, similar or generic the second article process, may include: to know described image The characteristic information for the second article that other result includes is matched with the characteristic information of the first article in the pre- placing articles library, with Obtain corresponding matching result;If the matching result is successful match, it is determined that include in described image recognition result and institute State the target item that the first article is identical, similar or generic in pre- placing articles library;Wherein, the characteristic information can wrap It includes: at least one of shape, color and classification.
In practical applications, the profile for the second article that can include according to image recognition result determines the shape of the second article Shape;And/or the second article can be determined according to the color-values (such as RGB (RGB, Red Green Blue) value) of the second article Color;And/or the second article is analyzed using deep learning method, to obtain the classification of the second article.
Optionally, the in the characteristic information for the second article for including by described image recognition result and the pre- placing articles library The characteristic information of one article carries out the spy that matched process may include: the second article that determining described image recognition result includes Similarity in reference breath and the pre- placing articles library between the characteristic information of the first article, and judge whether the similarity meets Preset similarity condition, if so, corresponding matching result can be successful match.
For example, the first object in the shape and color of the second article that can include by image recognition result and pre- placing articles library The shape and color of product are matched, if successful match, it may be considered that first article matches with second article.Example Such as, if the shape and color of the clothes that the corresponding image recognition result of the video frame of certain TV play includes are respectively " Western-style clothes shape 1 " and " claret ", and the shape and color of the first article for including in certain pre- placing articles library are respectively " Western-style clothes shape 2 " and " jujube It is red ", it may be considered that the clothes that image recognition result includes and the first article successful match.It is appreciated that the present invention Embodiment is without restriction for specific preset similarity condition, for example, preset similarity condition may include: that similarity is super Similarity threshold is crossed, which can wait the positive number no more than 1 for 0.8.
In step 202 after the target item that acquisition matches with described image recognition result in pre- placing articles library, step Rapid 203 can be by the corresponding target information addition of the target item in the video frame, so that subsequent user watches the view When frequency, when video progress to the video frame, target information is showed into user.
In an alternative embodiment of the invention, above-mentioned steps 203 add the corresponding target information of the target item The process being added in the video frame may include: in the determining video frame for adding the target position of the target information It sets;Add the target target information in target position in the video frame.
In practical applications, video frame can be analyzed, is suitable for adding mesh to obtain from the image of video frame Mark the target position of information.
In an alternative embodiment of the invention, the target position can be consistent with the target item, in this way, can To improve the naturalness of video.Correspondingly, for adding the target position of the target information in the above-mentioned determination video frame Process may include: degree of conformity between the existing article and the target item of the determining video frame;From the video The position that degree of conformity meets the article of prerequisite is obtained in the existing article of frame, as target position.
Wherein, existing article can be the article for including in video frame, in practical applications, can be by the existing of video frame The characteristic information (such as shape, color, title, classification) of article and the target item characteristic information (such as shape, color, Title, classification, brand and target information etc.) it is matched, to obtain degree of conformity between the two, further, if the degree of conformity Meet prerequisite, then this can be had to position of the article in the video frame as target position.Optionally, degree of conformity accords with Closing prerequisite may include: degree of conformity more than preset threshold etc..For example, if target item " cola " is the drink of pop can shape Material, then according to image analysis, then shape is where the article of pop can shape or ampuliform in available video frame Position etc., as target position.For another example, if the target information of target item is the logo (as " GAP ") of certain brand, then can be with The position etc. where the article of the clothes or shoes and hats that are consistent in video frame with the logo is obtained, as target position, for example, such as And the style of clothes or shoes and hats that the logo of " GAP " is consistent can be Casual Style corresponding with " GAP ", it will be understood that should Target position can position where the article that is consistent in video frame with the logo in the target position of the embodiment of the present invention Protection scope within, wherein article position is consistent with the logo and can refer to and be suitable in the position where the article adding the emblem Mark.
In another alternative embodiment of the invention, the target position can be the corresponding position of prediction picture target It sets, which can be not to influence the image object that user watches, which may include: in addition to people Image object except the article that object, personage dress, the prediction picture target can be the skies such as wall, ground, elevator, blue sky Between, which can also be furniture and other items etc..Correspondingly, described for adding in the above-mentioned determination video frame The process of the target position of target information may include: to identify the preset figure for being suitable for adding the target information in video frame As target area, using the prediction picture target area as the target position.
In a kind of application example of the invention, it is assumed that there are the prediction picture target areas of large area in certain video frame (such as wall area, ground region, elevator region or wardrobe region) then can identify that this is pre- by image recognition technology Image target area is set, and is inserted into target information (such as poster information, display diagram) in the prediction picture target area.Usually For watching for the user of video, it is interior other than video for will not perceiving the content of prediction picture target area substantially Hold, thus can reduce influence of the target information to video and user for target information dislike degree while, reality The recommendation of existing target information.
In practical applications, above-mentioned steps 203 are by the corresponding target information addition of the target item in the video frame Employed in addition manner may include:
Addition manner 1, according to the target information, modify to the information for corresponding to target position in the video frame, To obtain the modified video frame including the target information;Or
Addition manner 2 is added into the target information as the additional information for corresponding to target position in the video frame The video frame.
Wherein, addition manner 1 can be by modifying to the information for corresponding to target position in video frame, by target information It is added to the video frame, the information in video frame can be made to change in this way.
According to a kind of embodiment, the above-mentioned process modified to the information for corresponding to target position in the video frame can be with Include: to modify to the pixel value for corresponding to target position in video frame, specifically, can will correspond to target in the video frame First pixel value of position replaces with corresponding second pixel value of target information, wherein can believe according to the target of picture format The color-values (such as RGB (RGB, Red Green Blue) value) of the target information of breath and/or text formatting determine that target is believed Cease corresponding second pixel value.
According to another embodiment, the above-mentioned process modified to the information for corresponding to target position in the video frame can To include: to modify to the text information for corresponding to subtitle position in video frame, the text information for corresponding to subtitle position is repaired It is changed to the target information of text formatting.
Addition manner 2 can be using the target target information as the additional information for corresponding to target position in the video frame It is added into the video frame, wherein the additional information may include caption information or mask information.
Wherein it is possible to using the target information of text formatting as the caption information for corresponding to target position in video frame, for example, The personage of video frame is installed with clothes, then can regard the corresponding target information of target item (such as apparel brand A) as the clothes pair The caption information of position is answered, to realize the recommendation of apparel brand A.It should be noted that if the clothes that personage wears in video frame With brand, then the brand that can be had the clothes that the personage of the video frame wears by image processing techniques removes, to avoid The repetition of brand.
Mask refers to that the figure layer with certain transparent value, the parameter of mask may include size, display position and transparent value. Mask in the embodiment of the present invention can be covered in video frame, in this way, can realize mask and video by the parameter of mask It is shown while frame.For example, can be while frame of display video, target position in the video frame shows the mesh by mask Mark information.Also, in order to reduce influence of the mask for video frame, which can be located at where prediction picture target above-mentioned The band of position.
The embodiment of the present invention can be with by example of the target item corresponding target information addition in the video frame Include:
Example 1, that the first pixel value that target position is corresponded in the video frame is replaced with target target information is corresponding Second pixel value.For example, the first pixel value that can include by corresponding first image of the second article of certain in video frame replace with The second pixel value that corresponding second image of the generic target item of second article includes.The example of second article can wrap Include: the first beverage of pop can shape or ampuliform, generic target item may include: pop can shape with second article The picture of first beverage in video frame can be replaced with the picture of the second beverage by the second beverage of shape or ampuliform in this way.
Example 2 corresponds to the logo that target item is added on target position in the video frame, or by the video frame In the logo of the second article replace with the logo of target item.Wherein it is possible to be realized by the modification or mask of pixel value The addition or replacement of the logo of target item.Also, target position can be consistent with the logo of target item, for example, target The logo of article is the logo of certain brand, then the target position can be for suitable for the position for adding the logo, specifically, the logo The article position etc. of any type of items can be covered, for example, the type of items of logo " GAP " covering may include: clothes, cap The type of items of son etc., logo " NIKE " covering may include: clothes, shoes and hats, luggage etc..
Example 3 is corresponded on target position in the video frame through the corresponding target information of mask displaying target article, Such as logo, display diagram, the target information of poster picture format and/or target information of text formatting etc., are shown by mask Target information can be with link, so that user is linked into the corresponding page of target item by this.
In some embodiments of the invention, figure can also be carried out to the image object in the successive video frames that video includes As tracking, in this way, the image object in subsequent video frame, the video frame before being multiplexed can be directed to according to image trace result The corresponding target item of middle identical image target, operand needed for the acquisition of target item not only can be reduced, and The multiple appearance of target item can deepen memory of the user for target item.For example, (i is the volume of video frame to video frame i Number, i is the integer more than or equal to 0) there is the beverage 1 of pop can shape, which is and beverage 1 The beverage 2 of generic pop can shape then can carry out picture charge pattern to the beverage 1, if subsequent video frame i+1, video Still there is beverage 1 in frame i+2 ... video frame i+M (wherein, M is positive integer), then it can be for subsequent video frame i+1, view The beverage 1 for including in frequency frame i+2 ... video frame i+M is multiplexed the corresponding target information of beverage 2, until recognizing video frame i+M+1 In the beverage 1 disappear until so that, when video progress to the video frame for implanting target information, user be can see The target information that joined beverage 2, until the beverage 1 is no longer shown.
In some embodiments of the invention, it can be handled for video is played in real time, correspondingly, can be directed to and work as Corresponding first video frame of preceding playing time obtains corresponding first object article, and in corresponding second view of next playing time The corresponding target information of the first object article is added in frequency frame, wherein the corresponding image recognition result of the second video frame can To match with first object article.
It should be noted that identical image target is corresponding in the case where successive video frames include identical image target Target item can be corresponding with multiple target informations, in this way, can add the target in the different video frame of successive video frames The corresponding different target information of article, may be implemented the diversity that target item corresponds to target information in this way.For example, the object The corresponding different target information of product may include: the corresponding logo of same target item, display diagram, poster, even text information Deng.
In an embodiment of the present invention, the above method can also include: to carry out text to the corresponding video flowing of video Identification, to obtain corresponding text information;And/or speech recognition is carried out to the corresponding audio stream of video, to obtain corresponding text This information;The target item to match with the text information is obtained from pre- placing articles library;The target item is corresponding Target information is added in the video flowing and/or the corresponding video frame of audio stream
The embodiment of the present invention can carry out text to video flowing and/or the corresponding video frame of audio stream using text recognition technique This identification.Above-mentioned text recognition technique may include: OCR (optical character identification, Optical Character Recognition) technology etc., OCR technique can cut the character in image after carrying out the pretreatment such as noise reduction to image Point, to obtain single character picture, and identify the corresponding character of single character picture.It is appreciated that the embodiment of the present invention pair It is without restriction in specific text recognition technique.Alternatively, the corresponding subtitle file of the subtitle of available video frame, and from this The text information in subtitle is obtained in subtitle file;Alternatively, screenshotss can be carried out to the corresponding picture of video frame, and to snapshot As carrying out text identification, to obtain the text information in subtitle.It is appreciated that the embodiment of the present invention is for the text envelope in subtitle The specific acquisition modes of breath are without restriction.
The corresponding audio stream of video can be converted to text information using speech recognition technology by the embodiment of the present invention.If The corresponding audio stream of video is denoted as S, corresponding phonetic feature sequence O is obtained after carrying out a series of processing to S, is denoted as O={ O1, O2..., Oi..., OT, wherein OiIt is i-th of phonetic feature, T is phonetic feature total number.The corresponding sentence of audio stream S Son is considered as a word string being made of many words, is denoted as W={ w1, w2..., wn}.The process of speech recognition is exactly according to The phonetic feature sequence O known, finds out most probable word string W.
Specifically, speech recognition is the process of a Model Matching, in this process, can be first according to the language of people Sound feature establishes speech model, by the analysis of the voice signal to input, extracts required feature, Lai Jianli speech recognition institute The template needed;The process that voice inputted to user is identified is by the feature of the inputted voice of user and the template ratio Compared with process, the finally determining optimal Template with the inputted voice match of the user, to obtain the result of speech recognition.Tool The speech recognition algorithm of body can be used the training and recognizer of the hidden Markov model based on statistics, base can also be used In the training of neural network and recognizer, based on the matched recognizer of dynamic time consolidation etc. other algorithms, the present invention Embodiment is without restriction for specific speech recognition process.
The above-mentioned process that the target item to match with the text information is obtained from pre- placing articles library may include: Judge whether the text information includes corresponding with the ware of the first article or the first article in the pre- placing articles library The information that matches of characteristic information, if so, using first article as the object to match with the text information Product.
Optionally, the characteristic information may include: at least one of title, brand, classification and advertising slogan.Text information And characteristic information match may include: all or part of text information character corresponding with characteristic information it is identical, it is semantic it is identical, Semantic similar, semantic correlation etc..It is alternatively possible to determine text information and the corresponding text vector of characteristic information respectively, and root Semantic similar judgement is carried out according to the similarity between two text vectors, it will be understood that the embodiment of the present invention is for text envelope Breath matches with characteristic information and its corresponding matching process is without restriction.
In a kind of application example 1 of the invention, it is assumed that the corresponding subtitle of video frame includes that text information " has me to most like Three squirrels ", then can be special by text information title corresponding with the first article in pre- placing articles library, brand, classification etc. Reference breath is matched, since text information includes the information that characteristic information corresponding with the first article matches, therefore can be with The target item that brand is " three squirrels " is obtained, the target item that brand is " non-defective unit shop " can also be obtained, wherein " good Product shop " is identical as the classification of " three squirrels ".
In a kind of application example 2 of the invention, it is assumed that the corresponding subtitle of video frame includes that " I thought one to text information Excellent life " can then match text information advertising slogan information corresponding with the first article in pre- placing articles library, Assuming that matching result shows: the advertising slogan of text information and certain beverage " youth will wake spelling " matches, then can will The beverage is as target item.
In a kind of application example 3 of the invention, it is assumed that include in the corresponding image of video frame text information " GAP ", Personage's wearing i.e. in image has the article (such as clothes, cap, school bag) with " GAP " logo, then can believe the text It ceases the characteristic informations such as title corresponding with the first article in pre- placing articles library, brand, classification to be matched, due to text information Including the information that characteristic information corresponding with the first article matches, therefore available brand is the target item of " GAP ", may be used also To obtain the target item that brand is " excellent clothing library ", wherein " excellent clothing library " is same or similar with the classification of " GAP ".
In the embodiment of the present application, the corresponding video frame of audio stream can be one or more.It in practical applications, can be with It, can also be only by target item by the corresponding target information addition of target item in the corresponding all videos frame of the audio stream Corresponding target information addition is in the corresponding partial video frame of the audio stream.It is alternatively possible to first from the audio stream Selection is suitable for adding the target video frame of target information in corresponding video frame, then believes the corresponding target of the target item Breath addition is in the target video frame.It is alternatively possible to which video frame corresponding with the text information that target item matches is made For target video frame, in this manner it is achieved that video pictures are synchronous with target information.For example, the text to match with target item This information is the information of certain section of lines in video, then can believe the corresponding video frame of this section of lines as addition target is suitable for The target video frame of breath.Certainly, the embodiment of the present invention is without restriction for specific target video frame, for example, it can be with For the video frame etc. after video frame corresponding with the text information that target item matches, it is assumed that with object condition The text information matched is located at the end of certain section of lines in video, then can be using the corresponding next video frame of this section of lines as target Video frame.
In an alternative embodiment of the invention, above-mentioned to add the corresponding target information of the target item described Process in the corresponding video frame of audio stream may include: to select to be suitable for addition mesh from the corresponding video frame of the audio stream Mark the target video frame of information;It determines in the target video frame for adding the target position of the target information;Described Add the target information in target position in target video frame.
Wherein, the target video frame may include: video frame corresponding with the text information that target item matches.Tool Body, the selection from the audio stream corresponding video frame is suitable for adding the target video frame of the target information, can be with It include: to obtain the information to match in the recognition result with the characteristic information of the target item as target identification result; Part corresponding with the target identification result is extracted in the audio stream as target audio;The target audio is corresponding Video frame is as the target video frame;The recognition result is the text envelope obtained to the audio stream by speech recognition Breath.In practical applications, audio stream can have certain length, and the text information as recognition result also can have centainly Length, therefore the characteristic information that can be first depending on target item obtains target identification as a result, such as the target text in text information Then this information extracts the target audio in audio stream, and then navigates to the corresponding target video frame of target audio, wherein can To navigate to the corresponding target video frame of target audio according to the synchronism between video flowing and audio stream.
It should be noted that each target video frame can be directed to respectively when target video frame is multiple, determine wherein For adding the target position of the target information;In this way, can avoid a target video frame corresponding to a certain extent Duration compared with short-range missile apply family miss target information the problem of.
In an alternative embodiment of the invention, the method for the embodiment of the present invention can also include: according to the target Information modifies to the audio stream, to obtain the modified audio stream to match with the target information.Wherein, it repairs It may include the audio to match with target information in audio stream after changing, for example, it is assumed that certain section of lines of video are " to have me most Three squirrels liked ", it is assumed that target item is " non-defective unit shop ", then can be " to have me by the corresponding audio modification of the lines Favorite non-defective unit shop ".
According to a kind of embodiment, speech synthesis can be carried out to the target information, to obtain target audio;Using described Target audio replaces the audio to match in the audio stream with the target item, and replaced audio stream is as modified Audio stream.
Speech synthesis technique is also known as literary periodicals (TTS, Text-to-Speech) technology, i.e., is voice by text conversion Technology.The example of speech synthesis technique may include: based on hidden Markov model (HMM, Hidden Markov Model) Speech synthesis (HTS, HMM-based Speech Synthesis System), the basic ideas of HTS are: to voice signal into Row parametrization is decomposed, and establishes the corresponding HMM model of each parameters,acoustic, the HMM model prediction obtained using training when synthesis to The parameters,acoustic of synthesis text, these parameters,acoustics are input to Parametric synthesizers, finally obtain synthesis voice.Above-mentioned acoustics ginseng Number may include: at least one of frequency spectrum parameter and base frequency parameters.
According to another embodiment, the above-mentioned process modified to the audio stream may include: to obtain the audio Flow corresponding phonetic feature;Using the phonetic feature, speech synthesis is carried out to the target information, to obtain target audio; The audio to match in the audio stream with the target item, replaced audio stream conduct are replaced using the target audio Modified audio stream.In the present embodiment, the phonetic feature can use, determine the corresponding parameters,acoustic of speech synthesis, this The audio not being replaced in audio stream and consistency of the replaced audio in terms of phonetic feature may be implemented in sample.
Optionally, above-mentioned phonetic feature may include vocal print feature, and vocal print feature is the carrying that electricity consumption acoustic instrument is shown The sound wave spectrum of verbal information, vocal print not only has specificity, but also has the characteristics of relative stability.The embodiment of the present invention utilizes The corresponding vocal print feature of audio stream carries out the speech synthesis of target information, the target audio that synthesis can be made to obtain and audio stream pair The primary sound answered matches, and realizes the integrality of video content.
It in an alternative embodiment of the invention, can be to the audio stream before modified audio stream and modification (referred to as Raw audio streams) time shaft alignment is carried out, modified audio stream may be implemented for above-mentioned time shaft alignment and raw audio streams exist Consistency in terms of time shaft, the influence that can be synchronized in this way to avoid the modification because of audio stream for video/audio.Assuming that original Corresponding with text information " have my favorite three squirrels " in audio stream is the first audio, it is assumed that in modified audio stream Corresponding with text information after modification " have my favorite non-defective unit shop " is the second audio, then the first audio is in raw audio streams In temporal information and the second audio audio stream after the modification in temporal information be consistent;Specifically, the first audio and The corresponding duration of second audio can be consistent, also, when initial time and termination of first audio in raw audio streams Between with the initial time in the second audio audio stream after the modification and terminate the time and be consistent.
In an alternative embodiment of the invention, the method for the embodiment of the present invention can also include: to obtain locating for equipment The corresponding object language in geographic area and the geographic area;It is translated as the corresponding text information of audio stream to meet institute State the target text information of object language;By target text information addition in the corresponding video frame of the audio stream.Its In, equipment can be equipment used by a user, and the embodiment of the present invention can be for geographic area locating for user, by audio stream Corresponding text information (such as lines, the lyrics) carries out machine translation, and different language user may be implemented in this way to be understood The purpose of video content.The granularity of above-mentioned geographic area can be country etc., in this way, for the user in American-European region, it can The corresponding text information of audio stream is translated as English from a kind of language (such as Chinese).Certainly, the granularity of above-mentioned geographic area It can also be provinces and cities etc., in this way, the corresponding text information of audio stream can be translated as some area from a kind of language (such as Chinese) The dialect (such as northeast dialect, Sichuan dialect, Guangdong dialect) in domain.
To sum up, the method for processing video frequency of the embodiment of the present invention is obtained pre- by the information in machine automatic identification video frame The target item to match in placing articles library with image recognition result, and the corresponding target information addition of the target item is existed Into video frame;Since the embodiment of the present invention quick obtaining and the image of video frame can be known in the case where being not necessarily to manual intervention The target item that other result matches, therefore the processing time of video can be shortened and promote video treatment effeciency.
Also, in the case where being shortened the video processing time, manageable number of videos can go out in the unit time The growth of existing geometry rank, and the machine scale of video can be handled come infinite expanding by way of computing cluster, in this way, The video coverage rate of target information can be improved.
Further, the embodiment of the present invention carries out video processing by the way of image recognition and pre- placing articles storehouse matching, this Sample in the case that the information in the pre- placing articles library changes, can obtain newest mesh based on pre- placing articles storehouse matching Article and its corresponding target information are marked, therefore the timeliness for the target information added in the video frame can be improved, or even can To realize the real-time update of target information to a certain extent.
It should be noted that for simple description, therefore, it is stated as a series of movement is dynamic for embodiment of the method It combines, but those skilled in the art should understand that, the embodiment of the present invention is not by the limit of described athletic performance sequence System, because according to an embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, art technology Personnel also should be aware of, and the embodiments described in the specification are all preferred embodiments, and related athletic performance is simultaneously different It surely is necessary to the embodiment of the present invention.
Installation practice
Referring to Fig. 3, a kind of structural block diagram of video process apparatus embodiment of the invention is shown, can specifically include: Picture recognition module 301, target item obtain module 302 and target information adding module 303.
Wherein, picture recognition module 301, the video frame for including to video carries out image recognition, to obtain the view The corresponding image recognition result of frequency frame;
Target item obtains module 302, matches for obtaining from the pre- placing articles library with described image recognition result Target item;
Target information adding module 303, for adding the corresponding target information of the target item in the video frame In.
Optionally, the target item acquisition module 302 may include:
Judging submodule, for judge in described image recognition result whether may include and in the pre- placing articles library The second identical, similar or generic article of one article, if so, identifying knot using first article as with described image The target item that fruit matches.
Optionally, the judging submodule may include:
Matching unit, the characteristic information of the second article for may include by described image recognition result with it is described preset The characteristic information of the first article is matched in article library, to obtain corresponding matching result;
Target item determination unit, if being successful match for the matching result, it is determined that described image recognition result In may include the target item identical, similar or generic as the first article in the pre- placing articles library;
Wherein, the characteristic information may include: at least one of shape, color and classification.
Optionally, the target information adding module 303 may include:
Target position determines submodule, for determining in the video frame for adding the target position of target information;
Submodule is added, adds the target information for the target position in the video frame.
Optionally, the target position determines that submodule may include:
First object position determination unit, for determining between the existing article of the video frame and the target item Degree of conformity;The position that degree of conformity meets the article of prerequisite is obtained from the existing article of the video frame, as target position It sets;And/or
Second target position determination unit is suitable for adding the preset figure of the target information for identification out in video frame As target area, using the prediction picture target area as the target position.
Optionally, the target position is subtitle relevant position;
The addition submodule may include:
Subtitle modifies unit, for modifying according to target information to the subtitle that may include in the video frame, with The target information is added in the subtitle that the video frame may include;And/or
Subtitle adds subelement, for adding target information as the additional information of subtitle in the video frame described Around subtitle, to add the target information in the video frame.
Optionally, the target information adding module 303 may include:
Video frame information modifies submodule, for according to the target information, to corresponding to target position in the video frame Information modify, with obtain it is modified may include the target information video frame;Or
Additional submodule, for adding the target information as the additional information for corresponding to target position in the video frame The video frame is added.
Optionally, the video frame information modification submodule may include:
Pixel value modifies unit, for the first pixel value for corresponding to target position in the video frame to be replaced with target letter Cease corresponding second pixel value, target information and/or text of corresponding second pixel value of the target information according to picture format The color-values of the target information of this format determine;And/or
Text modification unit will correspond to word for modifying to the text information for corresponding to subtitle position in video frame The text information of curtain position is revised as the target information of text formatting.
Optionally, described device can also include:
Picture charge pattern module, the image object in the successive video frames for may include to the video carry out image with Track;
Target information Multiplexing module, the image object for being directed in subsequent video frame according to image trace result are multiple With the corresponding target information of identical image target in video frame before.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
The embodiment of the invention provides a kind of devices for video processing, include memory and one or one A above program, perhaps more than one program is stored in memory and is configured to by one or one for one of them It includes the instruction for performing the following operation that the above processor, which executes the one or more programs: including to video Video frame carries out image recognition, to obtain the corresponding image recognition result of the video frame;Acquisition and institute from pre- placing articles library State the target item that image recognition result matches;By the corresponding target information addition of the target item in the video frame In.
It is optionally, described that the target item to match with described image recognition result is obtained from pre- placing articles library, comprising:
Judge in described image recognition result whether include it is identical as the first article in the pre- placing articles library, similar or The second generic article, if so, using first article as the target item to match with described image recognition result.
Optionally, it is described judge in described image recognition result whether include and the first article phase in the pre- placing articles library The second same, similar or generic article, comprising:
First article in the characteristic information for the second article for including by described image recognition result and the pre- placing articles library Characteristic information matched, to obtain corresponding matching result;
If the matching result is successful match, it is determined that include in described image recognition result and the pre- placing articles library In identical, the similar or generic target item of the first article;
Wherein, the characteristic information includes: at least one of shape, color and classification.
It is optionally, described to add the corresponding target information of the target item in the video frame, comprising:
It determines in the video frame for adding the target position of target information;
Add the target information in target position in the video frame.
Optionally, for adding the target position of the target information in the determination video frame, comprising:
Determine the degree of conformity between the existing article of the video frame and the target item;From the existing of the video frame The position that degree of conformity meets the article of prerequisite is obtained in article, as target position;And/or
The prediction picture target area for being suitable for adding the target information in video frame is identified, by the prediction picture Target area is as the target position.
Optionally, the target position is subtitle relevant position;
Add the target information in the target position in the video frame
It modifies according to target information to the subtitle for including in the video frame, with the subtitle for including in the video frame The middle addition target information;And/or
It is added target information as the additional information of subtitle in the video frame around the subtitle, in the view The target information is added in frequency frame.
It is optionally, described to add the corresponding target information of the target item in the video frame, comprising:
According to the target information, modify to the information for corresponding to target position in the video frame, to be modified The video frame including the target information afterwards;Or
The video frame is added into using the target information as the additional information for corresponding to target position in the video frame.
It is optionally, described to modify to the information for corresponding to target position in the video frame, comprising:
The first pixel value that target position is corresponded in the video frame is replaced with into corresponding second pixel value of target information, Color of corresponding second pixel value of the target information according to the target information of picture format and/or the target information of text formatting Coloured silk value determines;And/or
It modifies to the text information for corresponding to subtitle position in video frame, the text information for corresponding to subtitle position is repaired It is changed to the target information of text formatting.
Optionally, described device is also configured to execute one or one by one or more than one processor Procedure above includes the instruction for performing the following operation:
Image object in the successive video frames for including to the video carries out image trace;
According to image trace result for the image object in subsequent video frame, identical figure in the video frame before being multiplexed As the corresponding target information of target.
Fig. 4 be it is shown according to an exemplary embodiment it is a kind of for video processing device 900 as terminal when frame Figure.For example, device 900 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console put down Panel device, Medical Devices, body-building equipment, personal digital assistant etc..
Referring to Fig. 4, device 900 may include following one or more components: processing component 902, memory 904, power supply Component 906, multimedia component 908, audio component 910, the interface 912 of input/output (I/O), sensor module 914, and Communication component 916.
The integrated operation of the usual control device 900 of processing component 902, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing element 902 may include that one or more processors 920 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 902 may include one or more modules, just Interaction between processing component 902 and other assemblies.For example, processing component 902 may include multi-media module, it is more to facilitate Interaction between media component 908 and processing component 902.
Memory 904 is configured as storing various types of data to support the operation in equipment 900.These data are shown Example includes the instruction of any application or method for operating on device 900, contact data, and telephone book data disappears Breath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.
Power supply module 906 provides electric power for the various assemblies of device 900.Power supply module 906 may include power management system System, one or more power supplys and other with for device 900 generate, manage, and distribute the associated component of electric power.
Multimedia component 908 includes the screen of one output interface of offer between described device 900 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding motion The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, Multimedia component 908 includes a front camera and/or rear camera.When equipment 900 is in operation mode, as shot mould When formula or video mode, front camera and/or rear camera can receive external multi-medium data.Each preposition camera shooting Head and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 910 is configured as output and/or input audio signal.For example, audio component 910 includes a Mike Wind (MIC), when device 900 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched It is set to reception external audio signal.The received audio signal can be further stored in memory 904 or via communication set Part 916 is sent.In some embodiments, audio component 910 further includes a loudspeaker, is used for output audio signal.
I/O interface 912 provides interface between processing component 902 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.
Sensor module 914 includes one or more sensors, and the state for providing various aspects for device 900 is commented Estimate.For example, sensor module 914 can detecte the state that opens/closes of equipment 900, and the relative positioning of component, for example, it is described Component is the display and keypad of device 900, and sensor module 914 can be with 900 1 components of detection device 900 or device Position change, the existence or non-existence that user contacts with device 900,900 orientation of device or acceleration/deceleration and device 900 Temperature change.Sensor module 914 may include proximity sensor, be configured to detect without any physical contact The presence of neighbouring article.Sensor module 914 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 916 is configured to facilitate the communication of wired or wireless way between device 900 and other equipment.Device 900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 916 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 900 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 904 of instruction, above-metioned instruction can be executed by the processor 920 of device 900 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..
Fig. 5 is the structural schematic diagram of server in some embodiments of the present invention.The server 1900 can be because of configuration or property Energy is different and generates bigger difference, may include one or more central processing units (central processing Units, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage applications The storage medium 1930 (such as one or more mass memory units) of program 1942 or data 1944.Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistent storage.The program for being stored in storage medium 1930 may include one A or more than one module (diagram does not mark), each module may include to the series of instructions operation in server.More into One step, central processing unit 1922 can be set to communicate with storage medium 1930, execute storage medium on server 1900 Series of instructions operation in 1930.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium by device (terminal or Server) processor execute when, enable a device to execute a kind of method for processing video frequency, which comprises to video bag The video frame included carries out image recognition, to obtain the corresponding image recognition result of the video frame;It is obtained from pre- placing articles library The target item to match with described image recognition result;By the corresponding target information addition of the target item in the video In frame.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Above to a kind of method for processing video frequency provided by the present invention, a kind of video process apparatus and a kind of at video The device of reason, is described in detail, and specific case used herein explains the principle of the present invention and embodiment It states, the above description of the embodiment is only used to help understand the method for the present invention and its core ideas;Meanwhile for this field Those skilled in the art, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, to sum up institute It states, the contents of this specification are not to be construed as limiting the invention.

Claims (12)

1. a kind of method for processing video frequency characterized by comprising
Image recognition is carried out to the video frame that video includes, to obtain the corresponding image recognition result of the video frame;
The target item to match with described image recognition result is obtained from pre- placing articles library;
By the corresponding target information addition of the target item in the video frame.
2. the method according to claim 1, wherein described obtain from pre- placing articles library identifies with described image As a result the target item to match, comprising:
Judge in described image recognition result whether to include identical, similar or similar with the first article in the pre- placing articles library Other second article, if so, using first article as the target item to match with described image recognition result.
3. according to the method described in claim 2, it is characterized in that, it is described judge in described image recognition result whether include with The second identical, similar or generic article of first article in the pre- placing articles library, comprising:
The spy of first article in the characteristic information for the second article for including by described image recognition result and the pre- placing articles library Reference breath is matched, to obtain corresponding matching result;
If the matching result is successful match, it is determined that include in described image recognition result and in the pre- placing articles library the Identical, the similar or generic target item of one article;
Wherein, the characteristic information includes: at least one of shape, color and classification.
4. the method according to claim 1, wherein described add the corresponding target information of the target item In the video frame, comprising:
It determines in the video frame for adding the target position of target information;
Add the target information in target position in the video frame.
5. according to the method described in claim 4, it is characterized in that, for adding the target in the determination video frame The target position of information, comprising:
Determine the degree of conformity between the existing article of the video frame and the target item;From the existing article of the video frame The middle position for obtaining degree of conformity and meeting the article of prerequisite, as target position;And/or
The prediction picture target area for being suitable for adding the target information in video frame is identified, by the prediction picture target Region is as the target position.
6. according to the method described in claim 4, it is characterized in that, the target position is subtitle relevant position;
Add the target information in the target position in the video frame
It modifies according to target information to the subtitle for including in the video frame, to add in the subtitle that the video frame includes Add the target information;And/or
It is added target information as the additional information of subtitle in the video frame around the subtitle, in the video frame The middle addition target information.
7. the method according to claim 1, wherein described add the corresponding target information of the target item In the video frame, comprising:
According to the target information, modify to the information for corresponding to target position in the video frame, it is modified to obtain Video frame including the target information;Or
The video frame is added into using the target information as the additional information for corresponding to target position in the video frame.
8. the method according to the description of claim 7 is characterized in that described to the information for corresponding to target position in the video frame It modifies, comprising:
The first pixel value that target position is corresponded in the video frame is replaced with into corresponding second pixel value of target information, it is described Corresponding second pixel value of target information is according to the target information of picture format and/or the color-values of the target information of text formatting It determines;And/or
It modifies to the text information for corresponding to subtitle position in video frame, the text information for corresponding to subtitle position is revised as The target information of text formatting.
9. according to claim 1 to any method in 8, which is characterized in that the method also includes:
Image object in the successive video frames for including to the video carries out image trace;
According to image trace result for the image object in subsequent video frame, identical image mesh in the video frame before being multiplexed Mark corresponding target information.
10. a kind of video process apparatus characterized by comprising
Picture recognition module, the video frame for including to video carries out image recognition, to obtain the corresponding figure of the video frame As recognition result;
Target item obtains module, for obtaining the object to match with described image recognition result from pre- placing articles library Product;And
Target information adding module, for adding the corresponding target information of the target item in the video frame.
11. a kind of device for video processing, which is characterized in that include memory and one or more than one Program, perhaps more than one program is stored in memory and is configured to by one or more than one processing for one of them It includes the instruction for performing the following operation that device, which executes the one or more programs:
Image recognition is carried out to the video frame that video includes, to obtain the corresponding image recognition result of the video frame;
The target item to match with described image recognition result is obtained from pre- placing articles library;
By the corresponding target information addition of the target item in the video frame.
12. a kind of machine readable media is stored thereon with instruction, when executed by one or more processors, so that device is held Method for processing video frequency of the row as described in one or more in claim 1 to 9.
CN201710737274.2A 2017-08-24 2017-08-24 Video processing method and device for video processing Active CN109429078B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710737274.2A CN109429078B (en) 2017-08-24 2017-08-24 Video processing method and device for video processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710737274.2A CN109429078B (en) 2017-08-24 2017-08-24 Video processing method and device for video processing

Publications (2)

Publication Number Publication Date
CN109429078A true CN109429078A (en) 2019-03-05
CN109429078B CN109429078B (en) 2022-02-22

Family

ID=65500340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710737274.2A Active CN109429078B (en) 2017-08-24 2017-08-24 Video processing method and device for video processing

Country Status (1)

Country Link
CN (1) CN109429078B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110278466A (en) * 2019-06-06 2019-09-24 浙江口碑网络技术有限公司 Put-on method, device and the equipment of short video ads
CN110489593A (en) * 2019-08-20 2019-11-22 腾讯科技(深圳)有限公司 Topic processing method, device, electronic equipment and the storage medium of video
CN110505498A (en) * 2019-09-03 2019-11-26 腾讯科技(深圳)有限公司 Processing, playback method, device and the computer-readable medium of video
CN110582021A (en) * 2019-09-26 2019-12-17 深圳市商汤科技有限公司 Information processing method and device, electronic equipment and storage medium
CN110769309A (en) * 2019-11-04 2020-02-07 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for presenting music points
CN112669204A (en) * 2021-01-04 2021-04-16 北京金山云网络技术有限公司 Image processing method, and training method and device of image processing model
WO2021114552A1 (en) * 2019-12-11 2021-06-17 北京市商汤科技开发有限公司 Information processing method and apparatus, electronic device and storage medium
CN114125556A (en) * 2021-11-12 2022-03-01 深圳麦风科技有限公司 Video data processing method, terminal and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140357280A1 (en) * 1996-12-16 2014-12-04 Ip Holdings, Inc. Image Networks For Mobile Communication
CN104811744A (en) * 2015-04-27 2015-07-29 北京视博云科技有限公司 Information putting method and system
CN105528715A (en) * 2014-10-16 2016-04-27 三星电子株式会社 Method for providing additional information related to broadcast content and electronic device implementing the same
CN105681918A (en) * 2015-09-16 2016-06-15 乐视致新电子科技(天津)有限公司 Method and system for presenting article relevant information in video stream

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140357280A1 (en) * 1996-12-16 2014-12-04 Ip Holdings, Inc. Image Networks For Mobile Communication
CN105528715A (en) * 2014-10-16 2016-04-27 三星电子株式会社 Method for providing additional information related to broadcast content and electronic device implementing the same
CN104811744A (en) * 2015-04-27 2015-07-29 北京视博云科技有限公司 Information putting method and system
CN105681918A (en) * 2015-09-16 2016-06-15 乐视致新电子科技(天津)有限公司 Method and system for presenting article relevant information in video stream

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110278466B (en) * 2019-06-06 2021-08-06 浙江口碑网络技术有限公司 Short video advertisement putting method, device and equipment
CN110278466A (en) * 2019-06-06 2019-09-24 浙江口碑网络技术有限公司 Put-on method, device and the equipment of short video ads
CN110489593A (en) * 2019-08-20 2019-11-22 腾讯科技(深圳)有限公司 Topic processing method, device, electronic equipment and the storage medium of video
CN110489593B (en) * 2019-08-20 2023-04-28 腾讯科技(深圳)有限公司 Topic processing method and device for video, electronic equipment and storage medium
CN110505498A (en) * 2019-09-03 2019-11-26 腾讯科技(深圳)有限公司 Processing, playback method, device and the computer-readable medium of video
CN110582021A (en) * 2019-09-26 2019-12-17 深圳市商汤科技有限公司 Information processing method and device, electronic equipment and storage medium
US11587593B2 (en) 2019-11-04 2023-02-21 Beijing Bytedance Network Technology Co., Ltd. Method and apparatus for displaying music points, and electronic device and medium
CN110769309A (en) * 2019-11-04 2020-02-07 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for presenting music points
WO2021114552A1 (en) * 2019-12-11 2021-06-17 北京市商汤科技开发有限公司 Information processing method and apparatus, electronic device and storage medium
CN112669204A (en) * 2021-01-04 2021-04-16 北京金山云网络技术有限公司 Image processing method, and training method and device of image processing model
CN112669204B (en) * 2021-01-04 2024-05-03 北京金山云网络技术有限公司 Image processing method, training method and device of image processing model
CN114125556A (en) * 2021-11-12 2022-03-01 深圳麦风科技有限公司 Video data processing method, terminal and storage medium
CN114125556B (en) * 2021-11-12 2024-03-26 深圳麦风科技有限公司 Video data processing method, terminal and storage medium

Also Published As

Publication number Publication date
CN109429078B (en) 2022-02-22

Similar Documents

Publication Publication Date Title
CN110019961A (en) Method for processing video frequency and device, for the device of video processing
CN109429078A (en) Method for processing video frequency and device, for the device of video processing
US8442389B2 (en) Electronic apparatus, reproduction control system, reproduction control method, and program therefor
CN110531860A (en) A kind of animating image driving method and device based on artificial intelligence
CN108933970A (en) The generation method and device of video
CN109447234A (en) A kind of model training method, synthesis are spoken the method and relevant apparatus of expression
CN109189987A (en) Video searching method and device
CN108231059A (en) Treating method and apparatus, the device for processing
CN108833969A (en) A kind of clipping method of live stream, device and equipment
CN110121093A (en) The searching method and device of target object in video
CN109547854A (en) A kind of TV method for pushing, smart television and storage medium based on Application on Voiceprint Recognition
CN109429077A (en) Method for processing video frequency and device, for the device of video processing
CN114401417B (en) Live stream object tracking method, device, equipment and medium thereof
CN107864410B (en) Multimedia data processing method and device, electronic equipment and storage medium
CN110322760A (en) Voice data generation method, device, terminal and storage medium
CN112185389A (en) Voice generation method and device, storage medium and electronic equipment
CN113409764B (en) Speech synthesis method and device for speech synthesis
CN109801618A (en) A kind of generation method and device of audio-frequency information
CN110162598A (en) A kind of data processing method and device, a kind of device for data processing
CN108628813A (en) Treating method and apparatus, the device for processing
WO2021136334A1 (en) Video generating method and apparatus, electronic device, and computer readable storage medium
US20240022772A1 (en) Video processing method and apparatus, medium, and program product
CN109784537A (en) Predictor method, device and the server and storage medium of ad click rate
CN108717403A (en) A kind of processing method, device and the device for processing
CN109429084A (en) Method for processing video frequency and device, for the device of video processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant