CN110134830A

CN110134830A - Video information data processing method, device, computer equipment and storage medium

Info

Publication number: CN110134830A
Application number: CN201910301087.9A
Authority: CN
Inventors: 谭莉
Original assignee: OneConnect Smart Technology Co Ltd
Current assignee: OneConnect Smart Technology Co Ltd
Priority date: 2019-04-15
Filing date: 2019-04-15
Publication date: 2019-08-16
Also published as: WO2020211392A1

Abstract

This application involves big data field, a kind of video information data processing method, device, computer equipment and storage medium are provided.Method includes: the extraction model and data correlation model for obtaining and having trained, current video is inputted and is extracted in model, extract multiple target video elements of video, the key feature stored in each target video element and data correlation model is matched, using the key feature of successful match as target critical feature, the associated data corresponding with the target critical feature stored in data correlation model is obtained, is shown associated data as displaying information.Video elementary is accurately extracted by extracting model, associated data corresponding with the high key feature of video elementary matching accuracy is shown as displaying information in reselection data correlation model, more accurate information can be pushed to user, improve the high efficiency and accuracy of video information data processing.

Description

Video information data processing method, device, computer equipment and storage medium

Technical field

This application involves field of computer technology, more particularly to a kind of video information data processing method, device, calculating Machine equipment and storage medium.

Background technique

With the rapid development of Computer Multimedia Technology, video is more and more living as multimedia important component Jump people within sweep of the eye, people by watch video can get information abundant.Currently, people are in viewing video When, want to further appreciate that video content, can then pass through browser searches relevant information into one according to keywords some in video The understanding video content of step.However, when to video content script information just have missing user accidentally watch one section it is imperfect When video, for example only several seconds video ads are not known in video again although user is interested in the content of video Specific product type in specifying information, such as video ads involved in appearance reuses after user watches video by naked eyes It when some keywords scan for, is influenced by keyword accuracy rate, the information often obtained is also inaccuracy, is led to not Carry out precise search.

Summary of the invention

Based on this, it is necessary to the technical issues of more accurately can not further understanding video content for user, provide Efficient and high accuracy video information data processing method, device, computer equipment and storage medium.

A kind of video information data processing method, which comprises

Obtain the extraction model and data correlation model trained；

Current video is inputted in the extraction model, multiple target video elements of the current video are extracted；

The key feature stored in each target video element and the data correlation model is matched；

Using the key feature of successful match as target critical feature；

The associated data corresponding with the target critical feature stored in the data correlation model is obtained, by the pass Connection data are shown as displaying information.

In one embodiment, the extraction model includes image submodel, voice submodel, vocal print submodel, described Extract model training step include:

Training sample set is obtained, the training sample set includes mark image set, mark voice collection, mark vocal print collection, institute Stating the mark image in mark image set includes characteristics of image, and the mark voice that the mark voice is concentrated includes phonetic feature, The mark vocal print that the mark vocal print is concentrated includes vocal print feature；

Obtain training video；

The mark image set and the training video are inputted in initial pictures submodel, the initial pictures submodel The training video is split, the segmented image of the training video is extracted；

The mark voice collection and the training video are inputted in initial speech submodel, the initial speech submodel The training video is split, the segmentation audio of the training video is extracted；

The mark vocal print collection and the training video are inputted in initial vocal print submodel, the initial vocal print submodel The training video is split, the segmentation vocal print of the training video is extracted；

According to the model of initial pictures submodel described in the segmented image of the training video and described image Character adjustment Parameter is joined according to the model that the segmentation audio of the training video and the phonetic feature adjust the initial speech submodel Number, the model parameter of the initial vocal print submodel is adjusted according to the segmentation vocal print of the training video and the vocal print feature, Until the initial pictures submodel, initial speech submodel, initial vocal print submodel all meet the condition of convergence, trained The extraction model.

In one embodiment, the data correlation model includes multiple data correlation submodels, data correlation model Establishment step includes:

Mark voice, the mark vocal print that mark image, the mark voice in the mark image set is concentrated The mark vocal print of concentration is stored as the key feature；

Obtain sample to be associated；

The type of the sample to be associated is converted into text, obtains Text Information Data, and store；

Incidence relation by establishing each key feature and the Text Information Data establishes the data correlation Model, the data correlation model is used to select the corresponding incidence relation of the key feature according to the key feature, and root Associated data is obtained from the Text Information Data according to the corresponding incidence relation of the key feature；

Obtain pre-set business scene；

Classified by way of supervised learning according to the pre-set business scene to the data correlation model, is obtained Multiple data correlation submodels, the key feature of the data correlation submodel comprising same type and the same type The corresponding associated data of the key feature.

In one embodiment, it before described the step of obtaining the extraction model and data correlation model trained, also wraps It includes:

Video extraction instruction is received, the video extraction instruction carries practical business scene identity；

The data trained are selected to close from the data correlation submodel according to the practical business scene identity Gang mould type.

In one embodiment, the display shows that information includes:

Obtain the display mode for showing information；

The displaying information is rolled on the current video in the form of barrage according to the display mode and is played；

The floating frame of preset time is popped up according to the display mode, and shows the displaying information in floating frame.

A kind of video information data processing unit, described device include:

Model obtains module, for obtaining the extraction model and data correlation model trained；

Video processing module extracts the more of the current video for inputting current video in the extraction model A target video element is also used to the key feature that will be stored in each target video element and the data correlation model It is matched；

Information display module, it is described for being also used to obtain using the key feature of successful match as target critical feature The associated data corresponding with the target critical feature stored in data correlation model is believed the associated data as displaying Breath display.

In one embodiment, described device further include:

Sample acquisition module, for obtaining training sample set, the training sample set includes mark image set, mark voice Collection, mark vocal print collection, the mark image marked in image set include characteristics of image, the mark language that the mark voice is concentrated Sound includes phonetic feature, and the mark vocal print that the mark vocal print is concentrated includes vocal print feature, is also used to obtain training video；

Model training module is extracted, for the mark image set and the training video to be inputted initial pictures submodel In, the initial pictures submodel is split the training video, extracts the segmented image of the training video, is used for By in the mark voice collection and training video input initial speech submodel, the initial speech submodel is to the instruction Practice video to be split, extract the segmentation audio of the training video, for regarding the mark vocal print collection and the training Frequency inputs in initial vocal print submodel, and the initial vocal print submodel is split the training video, extracts the instruction The segmentation vocal print for practicing video, is also used to the initial graph according to the segmented image of the training video and described image Character adjustment As the model parameter of submodel, initial speech is adjusted according to the segmentation audio of the training video and the phonetic feature The model parameter of model adjusts the initial vocal print submodel according to the segmentation vocal print of the training video and the vocal print feature Model parameter, until the initial pictures submodel, initial speech submodel, initial vocal print submodel, which all meet, restrains item Part, the extraction model trained.

In one embodiment, described device further include:

The sample acquisition module is also used to obtain sample to be associated, is also used to obtain pre-set business scene；

Data correlation model building module, for by it is described mark image set in mark image, the mark voice collection In mark voice, the mark vocal print concentrated of the mark vocal print as the key feature, and store, for by described wait close The type of connection sample is converted into text, obtains Text Information Data, and store, is also used to by establishing each key feature The data correlation model is established with the incidence relation of the Text Information Data, the data correlation model is used for according to Key feature selects the corresponding incidence relation of the key feature, and according to the corresponding incidence relation of the key feature from described Text Information Data obtains associated data；

Data correlation category of model module, for passing through prison to the data correlation model according to the pre-set business scene The mode that educational inspector practises is classified, and multiple data correlation submodels are obtained, and the data correlation submodel includes the institute of same type State the corresponding associated data of the key feature of key feature and the same type.

A kind of computer equipment, including memory and processor, the memory are stored with and can run on a processor Computer program, the processor realize the step in above-mentioned each embodiment of the method when executing the computer program.

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The step in above-mentioned each embodiment of the method is realized when row.

Above-mentioned video information data processing method, device, computer equipment and storage medium are mentioned by what acquisition had been trained Current video is inputted and is extracted in model, extracts multiple target video elements of video by modulus type and data correlation model, will The key feature stored in each target video element and data correlation model is matched, and the key feature of successful match is made For target critical feature, the associated data corresponding with the target critical feature stored in data correlation model is obtained, will be closed Connection data are shown as displaying information.The element in video is more accurately extracted by extracting model, then by video elementary and Key feature in data correlation model is matched, and pass high with video elementary matching accuracy in data correlation model is selected The corresponding associated data of key feature shows information as information is shown, by display, can push away the relevant information of video elementary User is given, accordingly even when user lacks the understanding to video script information, also can precisely be recommended by this programme acquisition and real Existing precise search allows user more acurrate, efficiently further can understand video content.

Detailed description of the invention

Fig. 1 is the application scenario diagram of one embodiment video information data processing method；

Fig. 2 is the flow diagram of video information data processing method in one embodiment；

Fig. 3 is the flow diagram of the training step of extraction model described in one embodiment；

Fig. 4 is the flow diagram of the establishment step of data correlation model described in one embodiment；

Fig. 5 is the flow diagram of video information data processing method in another embodiment；

Fig. 6 is that the flow diagram for showing information is shown in one embodiment；

Fig. 7 is the structural block diagram of video information data processing unit in one embodiment；

Fig. 8 is the internal structure chart of computer equipment in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Video information data processing method provided by the present application, can be applied in application environment as shown in Figure 1.Fig. 1 The applied environment figure run for video information data processing method in one embodiment.As shown in Figure 1, the application environment includes eventually End 110, server 120 are communicated between terminal 110 and server 120 by network, communication network can be wirelessly or Wireline communication network, such as IP network, cellular mobile communication networks etc., wherein the number of terminal and server is unlimited.But it needs It, can also be with it is noted that the realization of video information data processing method both can be completed directly in terminal 110 in this programme The displaying information that video information data is handled is sent to terminal by server 120 after the completion of directly on server 120 110 are shown.

Wherein, terminal 110 can be, but not limited to be various personal computers, laptop, smart phone, tablet computer With portable wearable device.Server 120 can use the server set of the either multiple server compositions of independent server Group realizes.Terminal or server are inputted current video by obtaining the extraction model and data correlation model trained It extracts in model, multiple target video elements of current video is extracted, by each target video element and data correlation model In the key feature that stores matched, using the key feature of successful match as target critical feature, obtain data correlation mould The associated data corresponding with target critical feature stored in type shows associated data as displaying information, passes through the exhibition of display Show information, the relevant information of video elementary can be pushed to user, accordingly even when user lacks to video script information Solution can also be obtained by this programme and precisely recommend and realize precise search, make user more acurrate, efficiently further Solve video content.

In one embodiment, it as shown in Fig. 2, providing a kind of video information data processing method, applies in this way It is illustrated for server or terminal in Fig. 1, comprising the following steps:

Step 210, the extraction model and data correlation model trained are obtained.

Wherein, it extracts model and refers to the model for extracting the element in video, can be based on depth complete sequence convolution The model of neural network.Data correlation model refers to be associated with for carrying out big data analysis according to the element extracted in video The model of data.

Step 220, current video is inputted and is extracted in model, extract multiple target video elements of current video.

Specifically, current video refers to the video being currently played, current video can be terminal using camera into The video of row shooting, the video that can be stored in terminal, the mode for obtaining current video can be by clicking the interface APP On " video is searched " button, reselection storage video at the terminal, or selection opens camera against in other terminals Video recorded, for example any advertisement, film video are recorded, or selection opens camera towards personage, life Routine matter carries out video capture in work.Wherein " video is searched " button can reside in any position of any APP homepage.Mesh Mark video elementary refers to all elements in video, can be the specific product occurred in video, portrait, music, dialogue, personage Vocal print, such as current video are one section of car exhibition advertisements, then vehicle in the advertisement, the face for representing star, advertisement background music, wide Video elementary can be considered as by accusing word, vocal print feature of personage etc..

Step 230, the key feature stored in each target video element and data correlation model is matched.

Step 240, using the key feature of successful match as target critical feature.

Specifically, key feature refers to the feature being stored in data correlation model, can be image, voice, vocal print.? In one embodiment, matching, which refers to, to be compared target video element and the key feature in data correlation model and calculates, A matching value is obtained after comparing calculation, a matching criteria value can be set, and is considered as when matching value reaches matching criteria value With success.

Step 250, the associated data corresponding with target critical feature stored in data correlation model is obtained, by incidence number It is shown according to as displaying information.

Specifically, target critical feature refers to refers to the successful key feature of target video Match of elemental composition, associated data Be stored in data correlation model with the one-to-one data of key feature, i.e., established in data correlation model key feature with The incidence relation of associated data.It shows that information refers to the relevant information for being pushed to user, shows that information can be and closed according to target What key feature was obtained from data correlation model, and associated data displayed on the terminals.For example, current video is one section of car exhibition Advertisement, after above-mentioned video information processing method, displaying information displayed on the terminals can be the vehicle occurred in video The information such as title, model, price, source area

In the present embodiment, by obtaining the extraction model and data correlation model trained, current video input is mentioned In modulus type, multiple target video elements of video are extracted, will be stored in each target video element and data correlation model Key feature matched, using the key feature of successful match as target critical feature, obtain and deposited in data correlation model The associated data corresponding with the target critical feature of storage is shown associated data as displaying information.By extracting model Element in video is more accurately extracted, then the key feature in video elementary and data correlation model is matched, is selected Associated data corresponding with the high key feature of video elementary matching accuracy in data correlation model is selected to lead to as information is shown It crosses display and shows information, the relevant information of video elementary can be pushed to user, accordingly even when user lacks to video originally The understanding of information is watching short several seconds advertisement videos or trailer video, is quickly obtaining richer displaying information, pass through Viewing obtains accurate information when showing information, when going to scan for according to the hobby of oneself, realizes precise search, allows user's energy It is enough more acurrate, efficiently further to understand video content.

In one embodiment, as shown in figure 3, providing a kind of video information data processing method, extracting model includes Image submodel, voice submodel, vocal print submodel, extracting model includes following training step:

Step 310, training sample set is obtained, training sample set includes mark image set, mark voice collection, mark vocal print Collection, marking the mark image in image set includes characteristics of image, and the mark voice that mark voice is concentrated includes phonetic feature, mark The mark vocal print that vocal print is concentrated includes vocal print feature.

Specifically, training sample set refers to the set that the sample of model is extracted for training.Training sample set, which can be, to be covered Lid is domestic, it is external, on line, under line, different channels, different samples of text and the image pattern such as true, simulation.Obtain training Sample can be obtained by way of buying with crawler, can also be from business applied by above-mentioned video information data processing method It is obtained in the database of software.The mark image set that training sample is concentrated refers to the image marked, what training sample was concentrated Mark voice collection refers to that the voice marked, the mark vocal print collection that training sample is concentrated refer to the vocal print marked.Figure It is characterized in the specific features of index note image, can be used to distinguish different mark images.Phonetic feature refers to mark voice Specific features can be used to distinguish different mark voices, and vocal print feature refers to the feature for distinguishing different labeled vocal print, can To be used to distinguish different mark vocal prints.

Step 320, training video is obtained.

Specifically, training video refers to the video sample that model is extracted for training, training video can be covering it is domestic, It is external, on line, under line, different channels etc. different videos.Obtaining video can be obtained by way of buying with crawler, It can be obtained from the database of business software applied by above-mentioned video information data processing method.

Step 330A inputs mark image set and training video in initial pictures submodel, initial pictures submodel pair Training video is split, and extracts the segmented image of training video.

Step 330B inputs mark voice collection and training video in initial speech submodel, initial speech submodel pair Training video is split, and extracts the segmentation audio of training video.

Step 330C inputs mark vocal print collection and training video in initial vocal print submodel, initial vocal print submodel pair Training video is split, and extracts the segmentation vocal print of training video.

Specifically, training video is split by initial pictures submodel, is referred to training video according to default frame It is split, extracts the image in video similar to the method for screenshot, wherein segmented image just refers to each frame in video Image.Training video is split by initial speech submodel, refers to the audio extracted in training video.Divide audio Including the music VF in training video, voice dialogue audio.Training video is split by initial vocal print submodel, is Finger is split the vocal print feature for extracting personage to training video.Segmentation vocal print includes the vocal print of different personages in training video Feature, such as boy student's vocal print and schoolgirl's vocal print have different vocal print features, deposit again in specific boy student's vocal print and schoolgirl's vocal print In different vocal print features.

Step 340, the initial pictures submodule according to the segmented image of the training video and described image Character adjustment The model parameter of type adjusts the initial speech submodel according to the segmentation audio of the training video and the phonetic feature Model parameter adjusts the model of the initial vocal print submodel according to the segmentation vocal print of the training video and the vocal print feature Parameter obtains until the initial pictures submodel, initial speech submodel, initial vocal print submodel all meet the condition of convergence The extraction model trained.

Specifically, adjustment model parameter, which refers to, allows the extraction model trained finally obtained that can accurately extract me Desired segmented image, segmentation audio, segmentation vocal print.By adjusting initial pictures model model parameter allow it is adjusted just Beginning image submodel can extract the segmented image closer to mark image.Similarly, the initial speech submodel is adjusted Model parameter be allow initial speech submodel adjusted can extract closer to mark voice segmentation audio.Adjustment The model parameter of the initial vocal print submodel is that initial vocal print submodel adjusted can be extracted closer to mark sound The segmentation vocal print of line.In the present embodiment, by being trained to extracting model, the extraction model trained, when will work as When preceding video input is extracted in model, more the element in current video can be extracted quickly and more accurately.

In one embodiment, as shown in figure 4, providing a kind of video information data processing method, the data correlation Model includes multiple data correlation submodels, and the establishment step of data correlation model includes:

Step 410, the mark voice, described mark image, the mark voice in the mark image set concentrated The mark vocal print of vocal print concentration is marked as the key feature, and is stored.

Step 420, sample to be associated is obtained, the type of the sample to be associated is converted into text, obtains text information Data, and store.

Step 430, by establishing described in the incidence relation foundation of each key feature and the Text Information Data Data correlation model, the data correlation model are used to select the corresponding association of the key feature to close according to the key feature System, and associated data is obtained from the Text Information Data according to the corresponding incidence relation of the key feature.

Specifically, key feature refers to the mark for extracting and marking the image of the mark in image set, mark voice concentration in model Infuse voice, the mark vocal print that mark vocal print is concentrated, sample to be associated refers to by a large amount of different channels of crawler or purchase, no The data of same type are equivalent to a database.Text Information Data, which refers to, is all converted into the data type of sample to be associated The text data of text.It establishes data correlation model to refer to through big data analysis, literary information data is screened and classified, The one-to-one relationship of associated data for establishing each key feature and being screened and classified from literary information data.I.e. from The associated data that literary information data is screened and classified, each key feature just correspond to an associated data, Ke Yitong It crosses key feature and obtains corresponding critical data.

Step 440, pre-set business scene is obtained, prison is passed through to the data correlation model according to the pre-set business scene The mode that educational inspector practises is classified, and multiple data correlation submodels are obtained, and the data correlation submodel includes the institute of same type State the corresponding associated data of the key feature of key feature and the same type.

Specifically, pre-set business scene refers to that presetting video information data processing method leads business to be applied Domain, pre-set business scene includes but is not limited to: can be the APP about automotive-type, about the APP of clothes shopping class, about shadow Depending on the APP of class, about the application software of the various domain types such as the APP of music class, can be various domain types applies net It stands.

Specifically, supervised learning is one of method of machine learning, supervised learning be exactly it has often been said that classification.Supervision is learned Typical example is exactly KNN, SVM in habit.The core concept of KNN algorithm is if k of the sample in feature space is a most Most of in adjacent sample belong to some classification, then the sample also belongs to this classification, and has this classification loading This characteristic.SVM (Support Vector Machine) refers to support vector machines, is a kind of common method of discrimination, leads to It is commonly used to carry out pattern-recognition, classification and regression analysis.Prison is passed through to the data correlation model according to pre-set business scene The mode that educational inspector practises is classified, and is obtained multiple data correlation submodels and is referred to through supervised learning in data correlation model Key feature and the corresponding associated data of key feature are classified, the just pass comprising same type in data correlation submodel The corresponding associated data of the key feature of key feature and same type.For example assume that pre-set business scene includes vehicle App business scenario, video display APP business scenario, clothes APP business scenario, then by supervised learning to data correlation model into The data correlation submodel that row classification obtains includes vehicle model, video display model, dress form.It can be in different data correlation The corresponding associated data of the key feature of the key feature of same type and same type is trained in model.

In one embodiment, as shown in figure 5, providing a kind of video information data processing method, step 210 it Before, the method also includes:

Step 510, video extraction instruction is received, video extraction instruction carries practical business scene identity.

Step 520, the number trained is selected from the data correlation submodel according to practical business scene identity According to correlation model.

Specifically, practical business scene identity refers to the business scenario of practical application and the difference mark of other business scenarios Will, business scenario identify a kind of mark for referring to same type business scope.Such as garment industry, vehicle field, cosmetic field, The different kinds of business such as video display field, music field field is identified with different business scenarios.Further, practical business Scene identity can be to be obtained from the currently used APP software of user.In one embodiment, for example user uses a shadow Depending on APP, then the practical application business scenario can be video display mark, identified according to video display from the data correlation submodel The data correlation model trained described in middle selection, by selecting the data correlation model data correlation model trained just It can targetedly show with video display field relevant information, improve the efficiency and accuracy rate handled video information data.

In one embodiment, as shown in fig. 6, providing a kind of video information data processing method, the display is shown Information includes:

Step 610, the display mode for showing information is obtained.

Step 620, the displaying information is rolled on the current video in the form of barrage according to the display mode It is dynamic to play.

Step 630, the floating frame of preset time is popped up according to the display mode, and the displaying letter is shown in floating frame Breath.

Specifically, barrage refers to the method that information is scrolled on currently playing video.Floating frame be it is a kind of as far as possible It is not noticeable, while the display methods of information is also shown to user.The display limited time of floating frame, can be according to user setting It disappears automatically after the display time.Here preset time is exactly the display time of the floating frame of user setting.Show that information is user The information pushed during seeing video forms a kind of information accurate recommendation.It is quasi- that user carries out purpose to the information seen again It really searches, forms a kind of information precise search, it is to be appreciated that the display mode for showing information includes but is not limited to barrage The mode of mode and floating frame.

It should be understood that although each step in the flow chart of Fig. 2-6 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-6 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.

In one embodiment, as shown in fig. 7, providing a kind of video information data processing unit, comprising: model obtains Module 710, video processing module 720, information display module 730, in which:

Model obtains module 710, for obtaining the extraction model and data correlation model trained；

Video processing module 720 extracts the current video for inputting current video in the extraction model Multiple target video elements, the key for being also used to store in each target video element and the data correlation model are special Sign is matched；

Information display module 730, for being also used to obtain institute using the key feature of successful match as target critical feature The associated data corresponding with the target critical feature stored in data correlation model is stated, using the associated data as displaying Information is shown.

In one embodiment, as shown in fig. 7, described device further includes sample acquisition module 740, model training mould is extracted Block 750, in which:

Sample acquisition module 740, for obtaining training sample set, the training sample set includes mark image set, mark Voice collection, mark vocal print collection, the mark image marked in image set include characteristics of image, the mark that the mark voice is concentrated Infusing voice includes phonetic feature, and the mark vocal print that the mark vocal print is concentrated includes vocal print feature, is also used to obtain training video；

Model training module 750 is extracted, for image set and training video input initial pictures submodel will to be marked In, initial pictures submodel is split training video, the segmented image of training video is extracted, for that will mark voice collection In training video input initial speech submodel, initial speech submodel is split training video, extracts trained view The segmentation audio of frequency inputs in initial vocal print submodel, initial vocal print submodel pair for that will mark vocal print collection and training video Training video is split, and extracts the segmentation vocal print of the training video, be also used to according to the segmented image of training video and Characteristics of image adjusts the model parameter of the initial pictures submodel, is adjusted according to the segmentation audio and phonetic feature of training video The model parameter of the initial speech submodel, according to initial vocal print described in the segmentation vocal print of training video and vocal print Character adjustment The model parameter of submodel, until initial pictures submodel, initial speech submodel, initial vocal print submodel all meet convergence item Part, the extraction model trained.

In one embodiment, as shown in fig. 7, described device further include:

Sample acquisition module 740 is also used to obtain sample to be associated, is also used to obtain pre-set business scene.

Data correlation model building module 760, the mark for the mark image in image set will to be marked, mark voice is concentrated The mark vocal print that note voice, mark vocal print are concentrated is stored as key feature, for the type of sample to be associated to be converted into Text obtains Text Information Data, and stores, and is also used to by establishing being associated with for each key feature and Text Information Data System establishes the data correlation model, and data correlation model is used to select the key feature corresponding according to the key feature Incidence relation, and associated data is obtained from the Text Information Data according to the corresponding incidence relation of the key feature；

Data correlation category of model module 770, for being learned by supervision according to pre-set business scene data correlation model The mode of habit is classified, and obtains multiple data correlation submodels, data correlation submodel include the key feature of same type with And the corresponding associated data of key feature of same type.

Specific restriction about video information data processing unit may refer to above for video information data processing The restriction of method, details are not described herein.The modules that above-mentioned video shows in information data processing unit can completely or partially lead to Software, hardware and combinations thereof are crossed to realize.Above-mentioned each module can be embedded in the form of hardware or independently of in computer equipment In processor, can also be stored in a software form in the memory in computer equipment, in order to processor call execute with The corresponding operation of upper modules.

In one embodiment, a kind of computer equipment is provided, which can be terminal, internal structure Figure can be as shown in Figure 8.The computer equipment includes processor, the memory, network interface, display connected by system bus Screen and input unit.Wherein, the processor of the computer equipment is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system and computer journey Sequence.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The network interface of machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor with Realize a kind of video information data processing method.The display screen of the computer equipment can be liquid crystal display or electric ink Display screen, the input unit of the computer equipment can be the touch layer covered on display screen, be also possible to outside computer equipment Key, trace ball or the Trackpad being arranged on shell can also be external keyboard, Trackpad or mouse etc..

It will be understood by those skilled in the art that structure shown in Fig. 8, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with Computer program, which performs the steps of when executing computer program obtains the extraction model trained and data pass Gang mould type inputs current video in the extraction model, extracts multiple target video elements of current video.By each mesh The key feature stored in mark video elementary and data correlation model is matched, using the key feature of successful match as target Key feature.Obtain the associated data corresponding with target critical feature that stores in data correlation model, using associated data as Show that information is shown.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of the extraction model and data correlation model for obtaining and having trained when being executed by processor, will work as forward sight Frequency inputs in the extraction model, extracts multiple target video elements of current video.By each target video element and number It is matched according to the key feature stored in correlation model, using the key feature of successful match as target critical feature.It obtains The associated data corresponding with target critical feature stored in data correlation model is shown associated data as displaying information.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink), DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of video information data processing method, which comprises

Obtain the extraction model and data correlation model trained；

Using the key feature of successful match as target critical feature；

The associated data corresponding with the target critical feature stored in the data correlation model is obtained, by the incidence number It is shown according to as displaying information.

2. the method according to claim 1, wherein the extraction model includes image submodel, voice submodule Type, vocal print submodel, the training step for extracting model include:

Training sample set is obtained, the training sample set includes mark image set, mark voice collection, mark vocal print collection, the mark Infusing the mark image in image set includes characteristics of image, and the mark voice that the mark voice is concentrated includes phonetic feature, described Marking the mark vocal print that vocal print is concentrated includes vocal print feature；

Obtain training video；

By in the mark image set and training video input initial pictures submodel, the initial pictures submodel is to institute It states training video to be split, extracts the segmented image of the training video；

By in the mark voice collection and training video input initial speech submodel, the initial speech submodel is to institute It states training video to be split, extracts the segmentation audio of the training video；

The mark vocal print collection and the training video are inputted in initial vocal print submodel, the initial vocal print submodel is to institute It states training video to be split, extracts the segmentation vocal print of the training video；

According to the model parameter of initial pictures submodel described in the segmented image of the training video and described image Character adjustment, The model parameter that the initial speech submodel is adjusted according to the segmentation audio of the training video and the phonetic feature, according to The segmentation vocal print of the training video and the vocal print feature adjust the model parameter of the initial vocal print submodel, until described Initial pictures submodel, initial speech submodel, initial vocal print submodel all meet the condition of convergence, and that has been trained described mentions Modulus type.

3. according to the method described in claim 2, it is characterized in that, the data correlation model includes multiple data correlation submodules The establishment step of type, data correlation model includes:

Mark voice, the mark vocal print that mark image, the mark voice in the mark image set is concentrated are concentrated Mark vocal print as the key feature, and store；

Obtain sample to be associated；

Incidence relation by establishing each key feature and the Text Information Data establishes the data correlation model, The data correlation model is used to select the corresponding incidence relation of the key feature according to the key feature, and according to described The corresponding incidence relation of key feature obtains associated data from the Text Information Data；

Obtain pre-set business scene；

Classified by way of supervised learning according to the pre-set business scene to the data correlation model, is obtained multiple Data correlation submodel, the data correlation submodel include same type the key feature and the same type it is described The corresponding associated data of key feature.

4. according to the method described in claim 3, it is characterized in that, described obtain the extraction model and data correlation mould trained Before the step of type, further includes:

The data correlation mould trained is selected from the data correlation submodel according to the practical business scene identity Type.

5. the method stated according to claim 1, which is characterized in that the display shows that information includes:

Obtain the display mode for showing information；

6. a kind of video information data processing unit, which is characterized in that described device includes:

Video processing module extracts multiple mesh of the current video for inputting current video in the extraction model Video elementary is marked, the key feature for being also used to store in each target video element and the data correlation model carries out Matching；

Information display module, for being also used to obtain the data using the key feature of successful match as target critical feature The associated data corresponding with the target critical feature stored in correlation model is shown the associated data as displaying information Show.

7. device according to claim 6, which is characterized in that described device further include:

Sample acquisition module, for obtaining training sample set, the training sample set include mark image set, mark voice collection, Vocal print collection is marked, the mark image in the mark image set includes characteristics of image, the mark voice that the mark voice is concentrated Comprising phonetic feature, the mark vocal print that the mark vocal print is concentrated includes vocal print feature, is also used to obtain training video；

Model training module is extracted, for the mark image set and the training video to be inputted in initial pictures submodel, The initial pictures submodel is split the training video, extracts the segmented image of the training video, and being used for will In the mark voice collection and training video input initial speech submodel, the initial speech submodel is to the training Video is split, and extracts the segmentation audio of the training video, is used for the mark vocal print collection and the training video It inputs in initial vocal print submodel, the initial vocal print submodel is split the training video, extracts the training The segmentation vocal print of video is also used to the initial pictures according to the segmented image of the training video and described image Character adjustment The model parameter of submodel adjusts the initial speech submodule according to the segmentation audio of the training video and the phonetic feature The model parameter of type adjusts the initial vocal print submodel according to the segmentation vocal print of the training video and the vocal print feature Model parameter, until the initial pictures submodel, initial speech submodel, initial vocal print submodel all meet the condition of convergence, The extraction model trained.

8. device according to claim 7, which is characterized in that described device further include:

The sample acquisition module is also used to the mark for concentrating mark image, the mark voice in the mark image set The mark vocal print that note voice, the mark vocal print are concentrated is stored as the key feature, is also used to obtain sample to be associated This, is also used to obtain pre-set business scene；

Data correlation model building module obtains text information number for the type of the sample to be associated to be converted into text According to, and store, it is also used to described in the incidence relation foundation by establishing each key feature and the Text Information Data Data correlation model, the data correlation model are used to select the corresponding association of the key feature to close according to the key feature System, and associated data is obtained from the Text Information Data according to the corresponding incidence relation of the key feature；

Data correlation category of model module, for being learned by supervision according to the pre-set business scene the data correlation model The mode of habit is classified, and multiple data correlation submodels are obtained, and the data correlation submodel includes the pass of same type The corresponding associated data of the key feature of key feature and the same type.

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 5 the method when executing the computer program.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of any one of claims 1 to 5 the method is realized when being executed by processor.