CN108235114A

CN108235114A - Content analysis method and system, electronic equipment, the storage medium of video flowing

Info

Publication number: CN108235114A
Application number: CN201711066691.5A
Authority: CN
Inventors: 吴军; 栗鹏; 戴皓文; 吕晗; 曾仕元
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2017-11-02
Filing date: 2017-11-02
Publication date: 2018-06-29

Abstract

The embodiment of the present disclosure discloses a kind of content analysis method of video flowing and system, electronic equipment, storage medium, wherein, the system comprises：Central processing unit CPU and multiple graphics processing unit GPU, wherein, the video flowing is divided into multiple sub-video streams corresponding with the multiple GPU, and the multiple sub-video stream is distributed to corresponding GPU by CPU for obtaining video flowing；The video image that GPU is used in pair sub-video stream corresponding with the GPU carries out Context resolution processing, obtains the Context resolution of the corresponding sub-video stream as a result, and the Context resolution result of corresponding sub-video stream is reported to the CPU；CPU is additionally operable to according to the multiple GPU Context resolutions reported as a result, obtaining the Context resolution result of the video flowing.The embodiment of the present disclosure realizes the intelligently parsing to video content, and improves processing speed and treatment effeciency to video flowing parallel processing by multiple GPU.

Description

Content analysis method and system, electronic equipment, the storage medium of video flowing

Technical field

This disclosure relates to computer vision technique, especially a kind of parsing method and system of video flowing, are deposited electronic equipment Storage media.

Background technology

With the development of video traffic and universal, the understanding to video content and mark demand have been risen.However, existing skill It is most of still using manual type in art, it is less efficient, and need to occupy a large amount of human resources, particularly video display class is regarded Frequently, such as：Film and serial etc., it may appear that miscellaneous scene and article make manual type face more challenges.

Disclosure

The embodiment of the present disclosure provides a kind of content analysis method of video flowing and system and electronic equipment.

According to the one side of the embodiment of the present disclosure, a kind of Context resolution system of video flowing is provided, including：

Central processing unit CPU and multiple graphics processing unit GPU, wherein,

The video flowing is divided into multiple sub-videos corresponding with the multiple GPU by the CPU for obtaining video flowing Stream, and the multiple sub-video stream is distributed to corresponding GPU, wherein, each sub-video stream is included in the video flowing The continuous video image of an at least frame, and different GPU corresponds to different sub-video streams；

The video image that the GPU is used in pair sub-video stream corresponding with the GPU carries out Context resolution processing, obtains The Context resolution of the corresponding sub-video stream by the Context resolution result of the corresponding sub-video stream as a result, and report to institute State CPU；

The CPU is additionally operable to according to the multiple GPU Context resolutions reported as a result, obtaining the content solution of the video flowing Analyse result.

In another embodiment based on disclosure above system, the CPU by the video flowing be divided into it is described The corresponding multiple sub-video streams of multiple GPU, including：

The video flowing is divided into corresponding with the multiple GPU more by the CPU according to the quantity of the multiple GPU A sub-video stream.

In another embodiment based on disclosure above system, the Context resolution result include it is following at least One：People information, dress ornament information, Item Information and scene information.

In another embodiment based on disclosure above system, the dress ornament information is included in following information at least It is a kind of：The classification information of dress ornament, colouring information, texture information, neckline information, cuff information and the dress ornament image coordinate letter Breath；And/or

The face information includes at least one of following information：Name information, face character information, face image Coordinate information.

In another embodiment based on disclosure above system, the GPU includes face recognition module, is used for：

The people information library of the video flowing is obtained, the people information library of the video flowing includes the people in the video flowing The facial image of object and name information；

Using the people information library of the video flowing, an at least frame video figure in the corresponding sub-video streams of the GPU is determined The face information of picture.

In another embodiment based on disclosure above system, the face recognition module is specifically used for：

Face datection is carried out to every frame video image in an at least frame video image for the corresponding sub-video stream, is obtained To described per at least one of frame video image facial image, wherein, corresponding to identical personage and appear in the son and regard There is at least at least one of frame continuous videos image facial image during frequency flows identical face tracking to identify；

Correspond to from the sub-video stream at least one facial image of same face tracking mark and determine target person Face image；

By to the facial image in the video people information library of the target facial image and the video flowing into pedestrian Face compares, and determines the corresponding personage of the face tracking mark of the target facial image.

In another embodiment based on disclosure above system, the GPU further includes dress ornament identification module, is used for：

It is regarded in an at least frame video image for the sub-video stream detected according to the face recognition module per frame Facial image in frequency image carries out dress ornament detection process per frame video image to described, obtains in every frame video image At least one dress ornament image, wherein, same face tracking is corresponded in the sub-video stream and is identified and corresponding to same dress ornament Dress ornament image there is identical dress ornament tracking mark；

According at least one dress ornament image in the sub-video stream with identical dress ornament tracking mark, the dress ornament is determined The corresponding dress ornament of tracking mark.

In another embodiment based on disclosure above system, the dress ornament identification module is additionally operable to establish the clothes Incidence relation between decorations tracking mark and face tracking mark, the Context resolution result include the face tracking mark Know corresponding people information and identify the corresponding dress ornament information of associated dress ornament tracking mark with the face tracking.

In another embodiment based on disclosure above system, the CPU is additionally operable to render in the video flowing The Context resolution result of the video flowing.

In another embodiment based on disclosure above system, the CPU obtains video flowing, including：

The CPU obtains the video flowing that user is uploaded by internet.

In another embodiment based on disclosure above system, the CPU is additionally operable to pass through interconnection in acquisition user Before the video flowing passed on the net, in response to the video upload request of the user, subscription authentication is carried out to the user.

According to the one side of the embodiment of the present disclosure, a kind of electronic equipment is provided, including：

Communication unit for being asked in response to the user of reception, by video stream to Context resolution system and receives The Context resolution result for the video flowing that the Context resolution system is sent；

Storage unit, for preserving the Context resolution result of the video flowing.

In another embodiment based on the above-mentioned electronic equipment of the disclosure, further include：

Rendering unit, for rendering the Context resolution result of the video flowing in the video flowing and showing the rendering Result.

According to the one side of the embodiment of the present disclosure, provide a kind of content analysis method of video flowing, applied to including The Context resolution system of central processing unit CPU and multiple graphics processing unit GPU, including：

CPU obtains video flowing, and the video flowing is divided into multiple sub-video streams corresponding with the multiple GPU, and will The multiple sub-video stream is distributed to corresponding GPU, wherein, each sub-video stream includes at least one in the video flowing The continuous video image of frame, and different GPU corresponds to different sub-video streams；

Video image in GPU pairs of sub-video stream corresponding with the GPU carries out Context resolution processing, obtains the correspondence Sub-video stream Context resolution as a result, and the Context resolution result of the corresponding sub-video stream is reported to the CPU；

CPU is according to the multiple GPU Context resolutions reported as a result, obtaining the Context resolution result of the video flowing.

In another embodiment based on the disclosure above method, it is described by the video flowing be divided into it is the multiple The corresponding multiple sub-video streams of GPU, including：

According to the quantity of the multiple GPU, the video flowing is divided into multiple sub-videos corresponding with the multiple GPU Stream.

In another embodiment based on the disclosure above method, the Context resolution result include it is following at least One：People information, dress ornament information, Item Information and scene information.

In another embodiment based on the disclosure above method, the dress ornament information is included in following information at least It is a kind of：The classification information of dress ornament, colouring information, texture information, neckline information, cuff information and the dress ornament image coordinate letter Breath；And/or

In another embodiment based on the disclosure above method, described GPU pairs sub-video stream corresponding with the GPU In video image carry out Context resolution processing, including：

In another embodiment based on the disclosure above method, the people information library using the video flowing, Determine the face information of an at least frame video image in the corresponding sub-video streams of the GPU, including：

In another embodiment based on the disclosure above method, described GPU pairs sub-video stream corresponding with the GPU In video image carry out Context resolution processing, further include：

It is regarded in an at least frame video image for the sub-video stream detected according to the face recognition module per frame Facial image in frequency image carries out dress ornament detection process per frame video image to described, obtains in every frame video image At least one dress ornament image；Wherein, same face tracking is corresponded in the sub-video stream to identify and corresponding to same dress ornament Dress ornament image there is identical dress ornament tracking mark；

In another embodiment based on the disclosure above method, further include：

Establish the incidence relation between the dress ornament tracking mark and face tracking mark, the Context resolution result Corresponding people information is identified including the face tracking and identifies associated dress ornament tracking mark institute with the face tracking Corresponding dress ornament information.

In another embodiment based on the disclosure above method, further include：

CPU renders the Context resolution result of the video flowing in the video flowing.

In another embodiment based on the disclosure above method, the CPU obtains video flowing, including：

The CPU obtains the video flowing that user is uploaded by internet.

In another embodiment based on the disclosure above method, obtain user in the CPU and uploaded by internet Video flowing before, further include：

In response to the video upload request of the user, subscription authentication is carried out to the user.

According to the one side of the embodiment of the present disclosure, a kind of electronic equipment is provided, including：Memory, can for storing Execute instruction；

And processor, it completes to regard as described above to perform the executable instruction for communicating with the memory The operation of the content analysis method of frequency stream.

According to the one side of the embodiment of the present disclosure, a kind of computer storage media is provided, it can for storing computer The instruction of reading, described instruction are performed the operation for the content analysis method for performing video flowing as described above.

According to the one side of the embodiment of the present disclosure, a kind of computer program is provided, including computer-readable code, Be characterized in that, when the computer-readable code in equipment when running, the processor execution in the equipment be used to implement as The instruction of each step in the content analysis method of the upper video flowing.

The content analysis method and system and electronic equipment, CPU of video flowing based on embodiment of the present disclosure offer will regard Frequency stream is divided into multiple sub-video streams corresponding with multiple GPU, and multiple sub-video streams are distributed to corresponding GPU, multiple GPU Context resolution processing concurrently is carried out to corresponding sub-video stream respectively, so as to improve processing speed and treatment effeciency.

Description of the drawings

Fig. 1 is the structure diagram of the Context resolution system of video flowing that the embodiment of the present disclosure provides.

Fig. 2 is the structure diagram of electronic equipment that the embodiment of the present disclosure provides.

Fig. 3 is the schematic flow chart of the content analysis method of video flowing that the embodiment of the present disclosure provides.

Fig. 4 is the structure diagram for realizing the terminal device of the embodiment of the present application or the electronic equipment of server.

Specific embodiment

The various exemplary embodiments of the disclosure are described in detail now with reference to attached drawing.It should be noted that：Unless in addition have Body illustrates that the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally Scope of disclosure.

Simultaneously, it should be appreciated that for ease of description, the size of the various pieces shown in attached drawing is not according to reality Proportionate relationship draw.

It is illustrative to the description only actually of at least one exemplary embodiment below, is never used as to the disclosure And its application or any restrictions that use.

Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.

It should be noted that：Similar label and letter represents similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need to that it is further discussed.

The embodiment of the present disclosure can be applied to computer system/system, can be with numerous other general or specialized calculate System environment or configuration operate together.Suitable for be used together with computer system/system well-known computing system, environment And/or the example of configuration includes but not limited to：Personal computer system, component computer system, thin client, thick client computer, It is hand-held or laptop devices, the system based on microprocessor, set-top box, programmable consumer electronics, NetPC Network PC, small-sized Computer system, large computer system and distributed cloud computing technology environment including any of the above described system, etc..

Computer system/system can be in computer system executable instruction (such as program performed by computer system Module) general linguistic context under describe.In general, program module can include routine, program, target program, component, logic, data Structure etc., they perform specific task or realize specific abstract data type.Computer system/system can divide Implement in cloth cloud computing environment, in distributed cloud computing environment, task is set by the remote processing being linked through a communication network Standby execution.In distributed cloud computing environment, program module can be located at the Local or Remote calculating system for including storage device It unites on storage medium.

Fig. 1 is the exemplary structure diagram of the Context resolution system of video flowing that the embodiment of the present disclosure provides.Such as Shown in Fig. 1, which includes：Central processing unit CPU 110 and multiple graphics processing unit GPU120.

Specifically, CPU110 is used to obtain video flowing.Wherein, optionally, which can include multi-frame video image. For example, the video flowing can be specially movie or television serial etc., the disclosure is not construed as limiting this.

CPU 110 is additionally operable to video flowing being divided into multiple sub-video streams corresponding with multiple GPU120, and by multiple sons Video flowing is distributed to corresponding GPU120, wherein, each sub-video stream includes the continuous video figure of an at least frame in video flowing Picture, and different GPU120 corresponds to different sub-video streams.

Optionally, a GPU 120 can correspond to one or more sub-video streams, and different GPU 120 is corresponding The quantity of sub-video stream can be identical or different.As an example, the quantity of multiple sub-video stream can be equal to the GPU 120 integral multiple, at this point, optionally, different GPU 120 can correspond to the sub-video stream of same number.For example, multiple son The quantity of video flowing can be equal to the quantity of multiple GPU 120, and multiple sub-video stream can be with multiple GPU 120 It corresponds, but the embodiment of the present disclosure is without being limited thereto.

GPU120 is used for the video image progress Context resolution processing in pair sub-video stream corresponding with GPU120, obtains pair The Context resolution for the sub-video stream answered by the Context resolution result of corresponding sub-video stream as a result, and report to CPU110.

Each GPU 120 in multiple GPU 120 can carry out Context resolution processing to its corresponding sub-video stream respectively, Wherein, optionally, each GPU 120 may be used identical flow and carry out Context resolution processing, in order to make it easy to understand, below with It is described for the operation of one of GPU 120.

Optionally, Context resolution result can include it is following at least one of：People information, dress ornament information, article letter Breath and scene information.

As an example, dress ornament information can include at least one of following information：Classification information, the color of dress ornament Information, texture information, neckline information, the image coordinate information of cuff information and dress ornament.

Wherein, the classification information of dress ornament can represent dress ornament classification, such as housing, shirt, etc..The image coordinate of dress ornament Information can indicate position of the dress ornament in video image.Optionally, dress ornament information can also include other information, and the disclosure is real It applies example and does not do any restriction to this.

As an example, face information can include at least one of following information：Name information, face character letter The image coordinate information of breath, face.

Wherein, name information can be specially the name of personage in video, such as dramatis personae's name, or people Real Name or stage name of object, etc..Character attribute information can include human face similarity degree information, personage gender information, personage Age information, etc..The image coordinate information of face can indicate position of the facial image in video image.Optionally, people Face information can also include other information, and the embodiment of the present disclosure does not limit this.

As an example, Item Information can include at least one of following message：Item Title information, article category The image coordinate information of property information and article.

Wherein, goods attribute information can include the information such as material, the brand of article.Optionally, Item Information can be with Including other information, the embodiment of the present disclosure does not limit this.

As an example, scene information can include scene name information, for example, seabeach, airport etc., the disclosure Embodiment does not limit this.

Optionally, GPU 120 specifically can carry out content by the video image in convolutional neural networks sub-video stream Dissection process, it is achieved thereby that in automatic sub-video stream video image content understanding, overcome lacking for artificial marked content Point improves the efficiency of content understanding.

In addition, the identification for personage, dress ornament, article and scene, can be respectively adopted different trained convolution god It is obtained through network.By the intelligently parsing to video content, face, clothes, article and scene are effectively extracted from video Structured messages, and combining its system are waited, it is achieved thereby that the intelligently parsing and structuring to video content are defeated Go out.

CPU110 is additionally operable to according to multiple GPU120 Context resolutions reported as a result, obtaining the Context resolution knot of video flowing Fruit.

Video flowing, is divided by the Context resolution system based on the video flowing that disclosure above-described embodiment provides by CPU Sub-video stream carries out parallel dissection process in each GPU respectively after multiple sub-video streams, effectively increases processing speed.

Optionally, in a specific example, CPU110 can be according to the quantity of multiple GPU 120, by video flowing decile For multiple sub-video streams corresponding with multiple GPU 120.

Specifically, in the Context resolution system of more GPU, CPU 110 can be according to currently available GPU quantity, dynamic Divide video.Video flowing can be divided into multiple sub-video streams by the CPU 110, for example, the CPU 110 can be by video flowing etc. It is divided into and multiple GPU 120 multiple sub-video streams, but the embodiment of the present disclosure is without being limited thereto correspondingly.Optionally, the CPU 110 can also otherwise divide video flowing, and the embodiment of the present disclosure is not construed as limiting this.

Optionally, each sub-video stream can be distributed to corresponding GPU by CPU 110, by GPU to the sub-video that receives Stream carries out taking out frame decoding processing, obtains at least frame video image that the sub-video stream includes, then at least a frame regards to this Every frame video image in frequency image carries out Context resolution processing, such as recognition of face processing, clothes identifying processing, article identification At least one of in processing and scene Recognition processing.Alternatively, each sub-video stream first can also be carried out pumping frame decoding by CPU 110 Processing, obtains at least frame video image that each sub-video stream includes, and by at least frame video image in sub-video stream Corresponding GPU 120 is transmitted to, at this point, GPU 120 can directly regard every frame in at least frame video image that receives Frequency image carries out Context resolution processing, but the embodiment of the present disclosure does not limit this.

In an optional example of the disclosure, GPU 120 can include：Face recognition module, for obtaining video flowing People information library, and using video flowing people information library, determine an at least frame video image in sub-video stream face letter Breath.

Specifically, the people information library of video flowing can include the facial image of personage and name information in video flowing, Or it can further include other information.Optionally, if the correlation that video flowing is movie and television play or other known personages regards Frequency segment etc. can be detected based on the corresponding people information library of the video flowing, to improve Context resolution efficiency.

It is alternatively possible to face inspection is carried out by every frame video image in convolutional neural networks sub-video stream respectively Survey, the face location information in video image obtained by Face datection, and based on face location information by facial image from It decomposites and in video image, the facial image that decomposition obtains is compared with the facial image in people information library, obtain Required face information realizes the recognition of face based on video image.It optionally, can be in addition to utilizing convolutional neural networks Recognition of face is realized by other means, and the embodiment of the present disclosure is not construed as limiting this.

As an example, specifically, face recognition module can be to an at least frame video figure for corresponding sub-video stream Every frame video image as in carries out Face datection, obtains at least one of every frame video image facial image, wherein, it is corresponding In identical personage and appear at least at least one of frame continuous videos image facial image in sub-video stream and have Identical face tracking mark.Further, face recognition module corresponds to same face tracking mark from sub-video stream Target facial image is determined at least one facial image, and in the people information library of the target facial image and the video flowing Facial image carry out face comparison, determine that the face tracking identifies corresponding face information.

Face recognition module can be the obtained each face assigner face trace labelling of Face datection, and based on face with Track label tracks the face.Wherein, different faces corresponds to different face trackings and marks.Appear in successive frame Identical face in video image can correspond to identical face tracking label, optionally, if identical face appear in it is non- In the video image of successive frame, then it may correspond to different face trackings and identify, but the embodiment of the present disclosure does not limit this It is fixed.

Specifically, it is identified for a face tracking, at least frame in sub-video stream can be identified to the face tracking Facial image in continuous video image in every frame video image carries out characteristics extraction, and the characteristic value based on extraction, really The confidence level of facial image in fixed every frame video image.It is then possible to based on being regarded in an at least frame video image per frame The confidence level of facial image in frequency image determines that the face tracking identifies corresponding target facial image.It for example, can be by the people The facial image that face tracking identifies confidence level maximum in corresponding at least one facial image identifies correspondence as the face tracking Target facial image, but the embodiment of the present disclosure does not limit this.Target facial image is regarded as the personage and is regarded in the son Frequency is best in quality in flowing or most useful for the facial image for carrying out face comparison, the personage based on the target facial image and video flowing Facial image in information bank is compared, and quality and/or facial angle due to facial image can be avoided to ask to the greatest extent The erroneous judgement inscribed and occurred effectively increases accuracy and the efficiency of recognition of face.

Optionally, GPU 120 further includes dress ornament identification module, and dress ornament inspection is carried out for the video image in sub-video stream Survey is handled.

Specifically, dress ornament identification module can be regarded according to an at least frame for the sub-video stream that face recognition module detects Facial image in frequency image in every frame video image carries out dress ornament detection process to every frame video image, obtains every frame video At least one of image dress ornament image.

Wherein, it is alternatively possible to which each dress ornament for each personage in sub-video stream distributes dress ornament tracking mark, and base Dress ornament is tracked and identified in dress ornament tracking mark.As an example, same face tracking is corresponded in sub-video stream It identifies and there is identical dress ornament tracking mark corresponding to the dress ornament image of same dress ornament.Since spectators are generally more concerned with video flowing In personage's dressing, the dress ornament identification module can based on the Face datection result of face recognition module carry out dress ornament detection, i.e., For the dress ornament that individually occurs in video image without identification, the clothes such as displayed in market, in this way, dress ornament identification mould Block is detected just for the matched dress ornament of facial image, can obtain more effective dress ornament information, i.e. personage's dressing information, And reduce system detectio and identification load.

Be assigned with dress ornament tracking mark after, dress ornament tracking module can according in sub-video stream have identical dress ornament with At least one dress ornament image of track mark determines the corresponding dress ornament information of dress ornament tracking mark.

Specifically, for a dress ornament tracking mark, it can be tracked and be identified in sub-video stream at least based on the dress ornament Dress ornament image in one frame video image carries out ballot selection, selects the highest dress ornament of votes as dress ornament tracking mark pair The dress ornament answered.For example, it may be determined that dress ornament tracking mark corresponding dress ornament image in every frame video image, and based on the clothes Decorations image determines the corresponding candidate dress ornament of dress ornament tracking mark, then can be to occurring dress ornament tracking mark in the sub-video stream All frame video images for knowing corresponding dress ornament image are for statistical analysis, determine same candidate dress ornament in all frame video figures The number occurred as in, and the most candidate dress ornament of the occurrence number in the sub-video stream is determined as dress ornament tracking mark pair The dress ornament answered.Alternatively it is also possible to carry out dress ornament identification using other modes, the embodiment of the present disclosure does not limit this.

Optionally, dress ornament identification module can also establish being associated between dress ornament tracking mark and face tracking mark System, correspondingly, Context resolution result include face tracking and identify corresponding people information and identified with face tracking associated The corresponding dress ornament information of dress ornament tracking mark.In this way, the viewability of Context resolution result can be improved.

Specifically, there are incidence relations with facial image for the dress ornament image that dress ornament identification module detects.Therefore, dress ornament Identification module can be based on facial image and dress ornament image position relationship (or other can determine facial image and dress ornament image position The information put) incidence relation is established to the two, at this point, the Context resolution result of output not only includes to being identified in video image People information, further include the dress ornament information with the personage of personage's information association, enable users to recognize more structurings Information.

Specifically, the coordinate information of coordinate information that can be based on the dress ornament image detected and detection facial image comes true It is fixed, when detection dress ornament image is with detecting the distance of facial image (such as：Euclidean distance) less than setting value when, it is believed that the detection take It is corresponding that image, which is adornd, with the detection facial image, i.e., the corresponding people of the detection facial image wears detection dress ornament image correspondence Dress ornament, but the embodiment of the present disclosure is without being limited thereto.

In one example of the present disclosure, GPU 120 can also include：Scene Recognition module, is used for：

Sub-video stream carries out shot segmentation, and it is continuous to obtain the corresponding at least frame of each camera lens at least one camera lens Video image；

Every frame video image in an at least continuous video image of frame corresponding to each camera lens is carried out at scene Recognition Reason obtains the corresponding scene information of each camera lens.

Optionally, when occurring multiple scenes (Same Scene is identified as multiple scenes due to judging by accident) in a camera lens, Average confidence is calculated respectively to each scene identified, compares the scene that the average confidence can obtain corresponding to the camera lens Information.

For scene Recognition, shot segmentation is first done, the sequential frame image in same camera lens will detect that per frame image The confidence level of all same scenes of sequential frame image in same camera lens is added and then made even by multiple scenes and its confidence level Mean value if average value is more than preset threshold value, just exports this scene.So multiple fields may be exported in a camera lens Scape.Such as：Sea, seabeach.

Specifically, every frame video image that can be to camera lens in the continuous video image of a corresponding at least frame carries out scene Identifying processing, each scene puts in obtaining this per the corresponding at least one scene of frame video image and at least one scene Reliability.It then, can be according to the scene in the continuous video of the corresponding at least frame of the camera lens for some scene identified Confidence level in image per frame video image, determines the corresponding objective degrees of confidence of the scene.For example, the corresponding target confidence of scene Degree can be specially to the scene in the continuous video image of an at least frame of the camera lens per frame video image confidence level into The average confidence that row average treatment obtains, but the embodiment of the present disclosure is without being limited thereto.Optionally, to obtain the camera lens corresponding more It, can be according to the objective degrees of confidence of scene each in multiple scene, really in a scene after the objective degrees of confidence of each scene The corresponding target scene of the fixed camera lens.If for example, the objective degrees of confidence of some scene be equal to or higher than predetermined threshold value, should Scene is determined as the target scene of the camera lens, but the embodiment of the present disclosure is without being limited thereto.

In one example of the embodiment of the present disclosure, GPU 120 can carry out object with every frame video image of sub-video stream Product examine is surveyed, and at least one images of items per frame video image is obtained, then according to an at least frame video for the sub-video stream Image is tracked and identified corresponding to same article, determines the corresponding Item Information of the images of items.

In this way, realizing the article identification to video image, the structured message of the article in sub-video stream is obtained.

Optionally, CPU 110 can obtain the video flowing that user is uploaded by internet.

Specifically, it can be that user is uploaded by internet or user passes through this that CPU 110, which receives video flowing, What ground uploaded, as long as legal upload means, CPU 110 can receive the video flowing, and the disclosure is not to uploading mode It limits.

For example, user can open browser accesses Context resolution system in a manner of web.Wherein, which can be with Server one is separation with server, and disclosure example is defined not to this.

Optionally, CPU 110 is additionally operable to before the video flowing that user is uploaded by internet is obtained, in response to user's Video upload request carries out subscription authentication to user.

Specifically, CPU 110 can realize video data management function, be responsible for user management to authorizing, management user weighs Limit etc.；In practical applications, after receiving user's request, it can be determined that user is identified in user right, only has power User's request that the user of limit sends out, is just handled；The user that the user for not having permission sends out is asked, is directly fed back Do not receive information.

Optionally, user here can be specially some terminal or Video service quotient, correspondingly, the Context resolution system System can be specially a server or content providers, but the embodiment of the present disclosure does not limit this.

Optionally, which can also be by network (such as internet) to the Context resolution knot of the user feedback video flowing Fruit, wherein optionally, the Context resolution result of the video flowing can be embodied with file or other forms.Optionally, which can be with The Context resolution of the video flowing is rendered in the video flowing as a result, can further show and rendered the Context resolution result Video flowing, but the embodiment of the present disclosure is without being limited thereto.

As an example, can utilize real-time rendering (such as：Web Renderings), to the Context resolution knot of video flowing Fruit is cooked real-time rendering in browser, to improve user experience.In this way, by webization visual effect, enable Context resolution result more It is intuitive to embody.

Optionally, which can also be according to the Context resolution of the video flowing as a result, further being located to the video flowing Reason, such as purchase link or relevant advertisements information for showing the dress ornament that certain personage wears under certain scene in the video flowing, etc., The embodiment of the present disclosure does not limit this.

As an example, user can upload a serial by web.In system can carry out this serial After holding dissection process, the corresponding video structural file of the serial is generated, which can include face, clothes Decorations and the space time informations such as article (where appeared at that time point) and scene information.User utilizes these structurings Information can carry out interactive with spectators.Such as by star's name, searching for Internet obtains other relevant informations of this star. Or semantic search is carried out using scene and people information, for example certain star appears in the segment of certain scene for the first time.It can be with profit With people information, selective viewing, etc. is carried out.

It is specially below that movie and television play is described the embodiment of the present disclosure as example using video flowing：

1st, user is by browser web uploaded videos file and the actor information file of the video, wherein, performer letter Breath file includes the actor information in the video file.

For example, video file is serial《Song of Joy 2》The first collection, and actor information file includes《Song of Joy 2》 In actor information, each performer corresponds to a line.Optionally, system can also obtain the actor information file from other modes, Such as the modes such as web search, the embodiment of the present disclosure do not limit this.

The browser can be a part for the system or be located on different physical equipments respectively from the system, this public affairs Embodiment is opened not limit this.

2nd, associated actor information file and video file are reached the video analytics engine of system and handled by system.

3rd, video analytics engine can carry out video file in 110 environment of CPU etc. according to the quantity of GPU 120 Point, then the sub-video stream obtained after decile and actor information file are distributed to respectively in each 120 environment of GPU and carried out Specific video dissection process.

4th, GPU 120 can carry out Context resolution processing to the sub-video stream received, obtain Context resolution result.Its In, which can include structural metadata, specifically include star's name, the face correlation category of present frame face Position coordinates in property and picture；The classification of clothes, color, texture, neckline and cuff information that star wears in present frame and Position coordinates in picture；The Item Title information of present frame and the position coordinates in picture；The scene information of present frame.

User can be in the effect of visualization of the browser preview Context resolution result or the structuring member of foradownloaded video The structural metadata file of data file, the wherein video can include the structuring letter of all frame video images of video Breath.

Communication unit 210 can be used for user's request in response to reception, by video stream to Context resolution system.

Optionally, user can send user's request, such as video upload/parsing by the browser of the electronic equipment Request.

Context resolution system can carry out Context resolution processing to video flowing, will obtain the Context resolution knot of corresponding video flowing Fruit.

The Context resolution result of video flowing that communication unit 210 can be sent with reception content resolution system.

Storage unit 220 can be used to save the Context resolution result for the video flowing that the communication unit 210 receives.

Specifically, it is asked, user can also be carried out by receiving user with the electronic equipment of Context resolution system independence User right is identified, is parsed in the video stream to Context resolution system that the user with permission is sent；The electronics Equipment simultaneously preserves video stream to Context resolution system, the Context resolution result of reception content resolution system feedback； Realize that synchronization shows Context resolution result or uniformly shows Context resolution result in some period in video streaming.

Optionally, electronic equipment 200 can also include rendering unit, for the content solution of render video stream in video streaming Analysis result simultaneously shows the video flowing for having rendered the Context resolution result.

Specifically, electronic equipment can utilize real-time rendering (such as：Web Renderings), to content analysis result clear Device of looking at does real-time rendering, improves user experience；By webization visual effect, Context resolution result is enable more intuitively to embody.

Fig. 3 is the schematic flow chart of the content analysis method of video flowing that the embodiment of the present disclosure provides.As shown in figure 3, This method 300 includes：

S301, CPU obtain video flowing, and video flowing is divided into multiple sub-video streams corresponding with multiple GPU, and will be multiple Sub-video stream is distributed to corresponding GPU.

Wherein, each sub-video stream includes the continuous video image of an at least frame in video flowing, and different GPU pairs Answer different sub-video streams.

Video image in S302, GPU pairs of sub-video streams corresponding with GPU carries out Context resolution processing, obtains corresponding The Context resolution of sub-video stream by the Context resolution result of corresponding sub-video stream as a result, and report to the CPU.

S303, CPU are according to multiple GPU Context resolutions reported as a result, obtaining the Context resolution result of video flowing.

Optionally, in a specific example, operation S301 can include：

According to the quantity of multiple GPU, video flowing is divided into multiple sub-video streams corresponding with multiple GPU.

In an optional example of the disclosure, operation S302 can include：

The people information library of video flowing is obtained, the people information library of video flowing includes the facial image of the personage in video flowing With name information；

Using the people information library of video flowing, the face of an at least frame video image in the corresponding sub-video streams of GPU is determined Information.

As an example, specifically, it using the people information library of video flowing, determines in the corresponding sub-video streams of GPU extremely The face information of a few frame video image, can include：

Face datection is carried out to every frame video image in an at least frame video image for corresponding sub-video stream, is obtained every At least one of frame video image facial image, wherein, corresponding to identical personage and appear in sub-video stream at least There is at least one of one frame continuous videos image facial image identical face tracking to identify；

Correspond to from sub-video stream at least one facial image of same face tracking mark and determine target face figure Picture；

Face comparison is carried out by the facial image in the video people information library to target facial image and video flowing, really The personage to set the goal corresponding to the face tracking mark of facial image.

Optionally, operation 302 can also include：

In at least frame video image for sub-video stream detected according to face recognition module in every frame video image Facial image, to every frame video image carry out dress ornament detection process, obtain at least one of every frame video image dress ornament figure Picture；Wherein, same face tracking is corresponded in sub-video stream to identify and correspond to the dress ornament image of same dress ornament with identical Dress ornament tracking mark；

According at least one dress ornament image in sub-video stream with identical dress ornament tracking mark, dress ornament tracking mark is determined Corresponding dress ornament.

In one example of the present disclosure, further include：

The incidence relation between dress ornament tracking mark and face tracking mark is established, Context resolution result includes face tracking It identifies corresponding people information and identifies the corresponding dress ornament information of associated dress ornament tracking mark with face tracking.

Optionally, it further includes：The Context resolution result of CPU render video streams in video streaming.To the video flowing into traveling The processing of one step, such as show the purchase link for the dress ornament that certain personage wears under certain scene in the video flowing or relevant advertisements letter Breath, etc., the embodiment of the present disclosure does not limit this.

Optionally, CPU obtains video flowing, including：CPU obtains the video flowing that user is uploaded by internet.

Specifically, it can be that user is uploaded by internet or user passes through this ground that CPU, which receives video flowing, It passes, as long as legal upload means, CPU can receive the video flowing, and the disclosure does not limit upload mode.

Optionally, it before the video flowing for obtaining that user is uploaded by internet in CPU, further includes：In response to regarding for user Frequency upload request carries out subscription authentication to user.

Specifically, CPU can realize video data management function, be responsible for user management to authorizing, manage user right etc. Deng；In practical applications, after receiving user's request, it can be determined that user is identified in user right, only with permission User's request that user sends out, is just handled；The user that the user for not having permission sends out is asked, directly feedback does not connect By information.

According to the one side of the embodiment of the present disclosure, a kind of electronic equipment provided, including：Memory, can for storing Execute instruction；

And processor, for being communicated with memory with the content for performing executable instruction video flowing thereby completing the present invention The operation of any of the above-described embodiment of analytic method.

According to the one side of the embodiment of the present disclosure, a kind of computer program provided, including computer-readable code, when When being run in equipment, the processor execution in the equipment is used to implement disclosure the various embodiments described above and regards computer-readable code The instruction of each step in the content analysis method of frequency stream.

According to the one side of the embodiment of the present disclosure, a kind of computer storage media provided can for storing computer The instruction of reading, described instruction are performed the behaviour for any of the above-described embodiment of content analysis method for performing video flowing of the present invention Make.

The embodiment of the present invention additionally provides a kind of electronic equipment, such as can be mobile terminal, personal computer (PC), put down Plate computer, server etc..Below with reference to Fig. 4, it illustrates suitable for being used for realizing the terminal device of the embodiment of the present application or service The structure diagram of the electronic equipment 400 of device：As shown in figure 4, computer system 400 includes one or more processors, communication Portion etc., one or more of processors are for example：One or more central processing unit (CPU) 401 and/or one or more Image processor (GPU) 413 etc., processor can according to the executable instruction being stored in read-only memory (ROM) 402 or From the executable instruction that storage section 408 is loaded into random access storage device (RAM) 403 perform various appropriate actions and Processing.Communication unit 412 may include but be not limited to network interface card, and the network interface card may include but be not limited to IB (Infiniband) network interface card.

Processor can communicate with read-only memory 402 and/or random access storage device 430 to perform executable instruction, It is connected by bus 404 with communication unit 412 and is communicated through communication unit 412 with other target devices, is implemented so as to complete the application The corresponding operation of any one method that example provides for example, CPU obtains video flowing, video flowing is divided into corresponding with multiple GPU Multiple sub-video streams, and multiple sub-video streams are distributed to corresponding GPU；Video in GPU pairs of sub-video stream corresponding with GPU Image carries out Context resolution processing, obtains the Context resolution of corresponding sub-video stream as a result, and will be in corresponding sub-video stream Hold analysis result and report to the CPU；CPU is according to multiple GPU Context resolutions reported as a result, obtaining the Context resolution of video flowing As a result.

In addition, in RAM 403, it can also be stored with various programs and data needed for device operation.CPU401、ROM402 And RAM403 is connected with each other by bus 404.In the case where there is RAM403, ROM402 is optional module.RAM403 is stored Executable instruction is written in executable instruction into ROM402 at runtime, and it is above-mentioned logical that executable instruction performs processor 401 The corresponding operation of letter method.Input/output (I/O) interface 405 is also connected to bus 404.Communication unit 412 can be integrally disposed, It may be set to be with multiple submodule (such as multiple IB network interface cards), and in bus link.

I/O interfaces 405 are connected to lower component：Importation 406 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 407 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.；Storage section 408 including hard disk etc.； And the communications portion 409 of the network interface card including LAN card, modem etc..Communications portion 409 via such as because The network of spy's net performs communication process.Driver 410 is also according to needing to be connected to I/O interfaces 405.Detachable media 411, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 410, as needed in order to be read from thereon Computer program be mounted into storage section 408 as needed.

Need what is illustrated, framework as shown in Figure 4 is only a kind of optional realization method, can root during concrete practice The component count amount and type of above-mentioned Fig. 4 are selected, are deleted, increased or replaced according to actual needs；It is set in different function component Put, can also be used it is separately positioned or integrally disposed and other implementations, such as GPU and CPU separate setting or can be by GPU collection Into on CPU, communication unit separates setting, can also be integrally disposed on CPU or GPU, etc..These interchangeable embodiments Each fall within protection domain disclosed by the invention.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product, it is machine readable including being tangibly embodied in Computer program on medium, computer program are included for the program code of the method shown in execution flow chart, program code It may include the corresponding instruction of corresponding execution method and step provided by the embodiments of the present application, for example, CPU obtains video flowing, by video Stream is divided into multiple sub-video streams corresponding with multiple GPU, and multiple sub-video streams are distributed to corresponding GPU；GPU pairs with Video image in the corresponding sub-video streams of GPU carries out Context resolution processing, obtains the Context resolution knot of corresponding sub-video stream Fruit, and the Context resolution result of corresponding sub-video stream is reported into the CPU；The Context resolution that CPU is reported according to multiple GPU As a result, obtain the Context resolution result of video flowing.In such embodiments, which can pass through communications portion 409 are downloaded and installed from network and/or are mounted from detachable media 411.In the computer program by central processing list When member (CPU) 401 is performed, the above-mentioned function of being limited in the present processes is performed.

Methods and apparatus of the present invention, equipment may be achieved in many ways.For example, software, hardware, firmware can be passed through Or any combinations of software, hardware, firmware realize methods and apparatus of the present invention, equipment.The step of for method Sequence is stated merely to illustrate, the step of method of the invention is not limited to sequence described in detail above, unless with other Mode illustrates.In addition, in some embodiments, the present invention can be also embodied as recording program in the recording medium, this A little programs include being used to implement machine readable instructions according to the method for the present invention.Thus, the present invention also covering stores to hold The recording medium of the program of row according to the method for the present invention.

Description of the invention provides for the sake of example and description, and is not exhaustively or will be of the invention It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those of ordinary skill in the art is enable to manage The solution present invention is so as to design the various embodiments with various modifications suitable for special-purpose.

Claims

1. a kind of Context resolution system of video flowing, which is characterized in that including：

Central processing unit CPU and multiple graphics processing unit GPU, wherein,

The CPU is used to obtain video flowing, and the video flowing is divided into multiple sub-video streams corresponding with the multiple GPU, And the multiple sub-video stream is distributed to corresponding GPU, wherein, each sub-video stream is included in the video flowing extremely The continuous video image of a few frame, and different GPU corresponds to different sub-video streams；

The video image that the GPU is used in pair sub-video stream corresponding with the GPU carries out Context resolution processing, obtains described The Context resolution of corresponding sub-video stream is as a result, and report to the Context resolution result of the corresponding sub-video stream described CPU；

The CPU is additionally operable to according to the multiple GPU Context resolutions reported as a result, obtaining the Context resolution knot of the video flowing Fruit.

2. system according to claim 1, which is characterized in that the CPU by the video flowing be divided into it is the multiple The corresponding multiple sub-video streams of GPU, including：

The video flowing is divided into multiple sons corresponding with the multiple GPU by the CPU according to the quantity of the multiple GPU Video flowing.

3. system according to claim 1 or 2, which is characterized in that the Context resolution result include it is following at least One：People information, dress ornament information, Item Information and scene information.

4. system according to claim 3, which is characterized in that the dress ornament information includes at least one in following information Kind：The classification information of dress ornament, colouring information, texture information, neckline information, cuff information and the dress ornament image coordinate letter Breath；And/or

The face information includes at least one of following information：Name information, face character information, the image coordinate of face Information.

5. system according to any one of claim 1 to 4, which is characterized in that the GPU includes face recognition module, For：

The people information library of the video flowing is obtained, the people information library of the video flowing includes the personage's in the video flowing Facial image and name information；

Using the people information library of the video flowing, an at least frame video image in the corresponding sub-video streams of the GPU is determined Face information.

6. a kind of electronic equipment, which is characterized in that including：

Communication unit for the video upload request in response to user, by video stream to Context resolution system and receives The Context resolution result for the video flowing that the Context resolution system is sent；

7. a kind of content analysis method of video flowing, which is characterized in that applied to including central processing unit CPU and multiple figures The Context resolution system of processing unit GPU, including：

CPU obtains video flowing, and the video flowing is divided into multiple sub-video streams corresponding with the multiple GPU, and by described in Multiple sub-video streams are distributed to corresponding GPU, wherein, at least frame that each sub-video stream is included in the video flowing connects Continuous video image, and different GPU corresponds to different sub-video streams；

Video image in GPU pairs of sub-video stream corresponding with the GPU carries out Context resolution processing, obtains the corresponding son The Context resolution of video flowing by the Context resolution result of the corresponding sub-video stream as a result, and report to the CPU；

8. a kind of electronic equipment, which is characterized in that including：Memory, for storing executable instruction；

And processor, for communicating to perform the executable instruction so as to complete described in claim 7 with the memory The operation of the content analysis method of video flowing.

9. a kind of computer storage media, for storing computer-readable instruction, which is characterized in that described instruction is performed When perform claim require 7 video flowings content analysis method operation.

10. a kind of computer program, including computer-readable code, which is characterized in that when the computer-readable code is being set During standby upper operation, the processor execution in the equipment is used to implement in the content analysis method of video flowing described in claim 7 The instruction of each step.