CN105095211B - The acquisition methods and device of multi-medium data - Google Patents

The acquisition methods and device of multi-medium data Download PDF

Info

Publication number
CN105095211B
CN105095211B CN201410163005.6A CN201410163005A CN105095211B CN 105095211 B CN105095211 B CN 105095211B CN 201410163005 A CN201410163005 A CN 201410163005A CN 105095211 B CN105095211 B CN 105095211B
Authority
CN
China
Prior art keywords
text information
medium data
mark
user
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410163005.6A
Other languages
Chinese (zh)
Other versions
CN105095211A (en
Inventor
刘巨安
梁汝峰
杨建武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Peking University
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University
Priority to CN201410163005.6A priority Critical patent/CN105095211B/en
Publication of CN105095211A publication Critical patent/CN105095211A/en
Application granted granted Critical
Publication of CN105095211B publication Critical patent/CN105095211B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a kind of acquisition methods of multi-medium data and devices, this method comprises: receiving the inquiry request of user's input, which includes multimedia messages keyword;According to multimedia messages keyword, the metadata in database is retrieved, determines corresponding with the text information that multimedia messages keyword matches purpose mark, the metadata include multimedia file text information and its corresponding mark;The text information to match to user's output with multimedia messages keyword;User is received to respond the confirmation of text information;Multi-medium data is obtained from server according to the purpose mark of text information corresponding with confirmation response;Multi-medium data is exported to user.The present invention can obtain fast and accurately multi-medium data according to the text information of multimedia file, more be able to satisfy demand of the user to multi-medium data personalization, and user is made to obtain preferably experience.

Description

The acquisition methods and device of multi-medium data
Technical field
The present embodiments relate to computer information technology field more particularly to a kind of acquisition methods of multi-medium data and Device.
Background technique
Fast development with Internet in the whole world, the carriers such as news, forum, blog, microblogging, video, audio are The main means of information are relayed and obtained as modern.People can be by Internet, easily in computer screen First hand Domestic News, hot topic, pictorial information, video information etc. are obtained on curtain, this makes internet increasingly become letter One of maximum, most potential media are influenced in breath service communication sphere.
For traditional texts information such as news, forum, blog, microbloggings on internet, pass through data grabber, storage at present The internet carriage that mass text data can be monitored with analysis has been developed with the analytical technology to traditional text information Feelings monitoring analysis system.By internet public feelings monitoring analysis system, people may be implemented the monitoring to internet data and divide Analysis.However the design of the system can only have ignored interconnection for text informations such as traditional news, forum, blog, microbloggings at present The multimedia messages such as online video, audio.Due to the gradually upgrading of current network bandwidth, internet video and audio-frequency information are clicked Rate is higher and higher, and the influence power of each video and audio website is also just increasing, if carried out just for traditional text information Monitoring analysis, then the multi-medium data in face of the magnanimity of internet can accurately not obtained for multi-medium data monitoring point As a result, the multi-medium data for meeting user demand can not be obtained according to monitoring analysis result, the experience property of user is poor for analysis.
Summary of the invention
The embodiment of the present invention provides the acquisition methods and device of a kind of multi-medium data, can be according to the text of multimedia file This information obtains multi-medium data fast and accurately, is more able to satisfy demand of the user to multi-medium data personalization, makes user Obtain preferably experience.
The present invention provides a kind of acquisition methods of multi-medium data, comprising:
The inquiry request of user's input is received, the inquiry request includes multimedia messages keyword;
According to the multimedia messages keyword, the metadata in database is retrieved, the determining and multimedia The corresponding purpose mark of the text information that information key matches, the metadata include multimedia file text information and Its corresponding mark;
The text information to match to user's output with the multimedia messages keyword;
User is received to respond the confirmation of the text information;
Multi-medium data is obtained from server according to the purpose mark of text information corresponding with the confirmation response;
The multi-medium data is exported to user.
The present invention also provides a kind of acquisition device of multi-medium data, comprising:
Receiving module, for receiving the inquiry request of user's input, the inquiry request includes multimedia messages keyword;
Retrieval module, for being retrieved to the metadata in database according to the multimedia messages keyword;
Determining module, for determining purpose mark corresponding with the text information that the multimedia messages keyword matches Know, the metadata include multimedia file text information and its corresponding mark;
Output module, the text information for matching to user's output with the multimedia messages keyword;
The receiving module is also used to receive user and responds to the confirmation of the text information;
Module is obtained, for obtaining from server according to the purpose mark of text information corresponding with the confirmation response Take multi-medium data;
The output module, for exporting the multi-medium data to user.
The present invention a kind of acquisition methods and device of multi-medium data, by receiving the inquiry request of user's input, this is looked into Asking request includes multimedia messages keyword;According to multimedia messages keyword, the metadata in database is retrieved, really Fixed purpose mark corresponding with the text information that multimedia messages keyword matches, which includes the text of multimedia file This information and its corresponding mark;The text information to match to user's output with multimedia messages keyword;Receive user couple The confirmation of text information responds;More matchmakers are obtained from server according to the purpose mark of text information corresponding with confirmation response Volume data;Multi-medium data is exported to user.The present invention can be according to the text information of multimedia file quickly and accurately Multi-medium data is obtained, demand of the user to multi-medium data personalization is more able to satisfy, user is made to obtain preferably experience.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.
Fig. 1 is the flow chart of the acquisition methods embodiment one of multi-medium data of the present invention;
Fig. 2 is the first pass figure of the acquisition methods embodiment two of multi-medium data of the present invention;
Fig. 3 is the second flow chart of the acquisition methods embodiment two of multi-medium data of the present invention;
Fig. 4 is the structural schematic diagram of the acquisition device embodiment one of multi-medium data of the present invention;
Fig. 5 is the structural schematic diagram of the acquisition device embodiment two of multi-medium data of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
In the embodiment of the present invention, multimedia file may include video file and audio file, specifically, in video file Including video data and text information, video data includes video finished product, key frame of video and video caption or speech text.? It include audio data and text information in audio file, audio data includes audio finished product and audio text.
Fig. 1 is the flow chart of the acquisition methods embodiment one of multi-medium data of the present invention, as shown in Figure 1, the present embodiment Executing subject is the user terminal with memory space, can specifically be realized in the user terminal by software mode.The then party Method includes:
Step 101, the inquiry request of user's input is received, which includes multimedia messages keyword.
In the present embodiment, receive user input inquiry request before, can according to user input configuration rule, generate with The relevant multimedia messages keyword of configuration rule receives the inquiry request of user's input, includes multimedia in inquiry request Information key.Such as before the inquiry request for receiving user's input, the configuration rule of user's input are as follows: " Beijing room rate ", according to The multimedia messages keyword relevant to " Beijing room rate " of the configuration rule of " Beijing room rate ", generation has " Beijing room rate ", " state Five ", " house property tax " etc..It will include that the multimedias such as " Beijing room rate ", " five, state ", " house property tax " are believed so in inquiry request Cease keyword.
Step 102, according to multimedia messages keyword, the metadata in database is retrieved, determining and multimedia The corresponding purpose mark of the text information that information key matches, the metadata include multimedia file text information and its Corresponding mark.
In the present embodiment, database is (Hadoop Database, abbreviation Hbase) database, and Hbase database is suitable Together in the database of unstructured data storage, it is per-column rather than based on capable mode, the more easily big number of read-write According to content.A large amount of metadata is stored in Hbase database.Metadata includes the text information and its correspondence of multimedia file Mark.Wherein the text information of multimedia file include: the title of multi-medium data, author, issuing time, affiliated web site, Chained address, regional information, multimedia abstract etc..Regional information therein can be the region letter of multi-medium data affiliated web site Breath, is also possible to the regional information of some specific channel of website belonging to multi-medium data.
In the present embodiment, according to multimedia messages keyword, to the metadata in database carry out retrieval be periodically into Capable, because the multimedia file of internet is to constantly update, metadata in the database is also in continuous renewal In the process, in carrying out periodically retrieval, the text of continuous renewal to match with multimedia messages keyword can be found Information makes retrieval with more real-time and accuracy.The specific search cycle can preset, and such as be set as retrieving for every 10 minutes Once, this embodiment is not limited for the size of search cycle.
It is in the text information of each metadata in the present embodiment before being retrieved to the metadata in database The creations such as title, author, issuing time, affiliated web site, chained address, regional information, the multimedia abstract of multi-medium data Index keeps retrieval quicker.
Step 103, the text information to match to user's output with multimedia messages keyword.
In the present embodiment, the text information of output to match with multimedia messages keyword number be and multimedia believe Keyword and whether be that hot topic is related is ceased, when multimedia messages keyword is less and when being hot topic, in data The more text information to match with multimedia messages keyword will be obtained in library.By the shape that these text informations are with list Formula is collected and is presented to user.
Step 104, user is received to respond the confirmation of text information.
According to the text information of output to match with multimedia messages keyword, user can be in the mark of every text message The content that the corresponding multi-medium data of text information is told about is seen in topic, author and abstract, one or more sense of confirmation is emerging The text information of interest.
Step 105, multimedia number is obtained from server according to the purpose mark that confirmation responds corresponding text information According to.
In the present embodiment, after receiving user to the confirmation response of text information, so that it may respond corresponding text according to confirmation The purpose mark of this information obtains multi-medium data from server, so that user checks multi-medium data.
Step 106, multi-medium data is exported to user.
In the present embodiment, multi-medium data is exported to user, when multi-medium data is video data, user be may be viewed by Video checks key frame in video and obtains video caption or speech text.When multi-medium data is audio data, use Family can listen to audio, obtain audio text etc..
In this implementation, by receiving the inquiry request of user's input, which includes multimedia messages keyword;Root According to multimedia messages keyword, the metadata in database is retrieved, what determining and multimedia messages keyword matched The corresponding purpose mark of text information, the metadata include multimedia file text information and its corresponding mark;To user The text information that output matches with multimedia messages keyword;User is received to respond the confirmation of text information;According to it is true Recognize and responds the purpose mark of corresponding text information and obtain multi-medium data from server;Multi-medium data is defeated to user Out.The present invention can obtain fast and accurately multi-medium data according to the text information of multimedia file, more be able to satisfy user Demand to multi-medium data personalization makes user obtain preferably experience.
Fig. 2 is the first pass figure of the acquisition methods embodiment two of multi-medium data of the present invention, as shown in Fig. 2, this implementation The executing subject of example is the user terminal with memory space, can specifically be realized in the user terminal by software mode, then This method comprises:
Step 201, multimedia file is acquired.
Multimedia file in the present embodiment by acquisition is illustrated for being all video file.
In the present embodiment, multimedia file is to swash to take off from internet, in local disk or large server It is stored.Such as when multimedia file is video file, by the text information of video file, video finished product, key frame of video And the subtitle or speech text of video are stored.Wherein the text information of video file includes title, author, the hair of video Cloth time, affiliated web site, chained address, regional information, video frequency abstract etc., if the issuing time in text information is empty, To acquire the time of text information for its default publications time, to guarantee the integrality of text information.The acquisition of key frame of video It is to extract Video Key frame technique using automatic, the extraction of key frame can be effectively carried out to video finished product, by the pass of extraction Key frame is stored, and the subtitle or speech text of video are using subtitle, speech recognition technology to the voice or word of video finished product Curtain is identified, is converted to the text of video caption or voice, the video caption of conversion or voice are carried out in a text form Storage.
Step 202, the text information and multi-medium data of multimedia file are extracted.
In the present embodiment, due to being influenced by network bandwidth, when acquiring multimedia file, often multimedia file Text information first acquires completion, followed by multi-medium data video finished product, according to video finished product extract key frame of video, The identification of video caption and speech text is carried out according to video finished product.By the text information of the multimedia file after acquisition and more Media data extracts, and carries out classification storage.
Step 203, processing is filtered to text information.
In the present embodiment, some rubbish word informations are pre-configured with, to garbage information filtering, by way of text matches Judge whether containing rubbish word in the title and abstract of text information, if single text information is matched to more than two different rubbish When rubbish word, it is filtered, remaining text information is effective text information.
Step 204, Hash calculation is carried out to the webpage link address of text information, using obtained cryptographic Hash as the text The corresponding mark of this information, and Hash calculation is carried out to the webpage link address in the related information of multi-medium data, it obtains Cryptographic Hash is as the corresponding mark of multi-medium data.
In the present embodiment, after being filtered processing to text information, the web page interlinkage of every text message all existence anduniquess Address calculates the cryptographic Hash of webpage link address, the cryptographic Hash conduct using MurmurHash algorithm according to webpage link address The unique identification of text information.
The present embodiment calculates the cryptographic Hash of webpage link address using MurmurHash algorithm, is because MurmurHash is calculated Method is a kind of non-encrypted hash algorithm, and in aspect of performance and traditional CRC32 algorithm, MD5 algorithm, SHA-1 algorithm etc. is compared to tool Standby some superiority, and collision rate is relatively low.
In the present embodiment, when being stored multimedia data classification, the association letter of each multi-medium data will record Breath, related information include the store path after each multi-medium data extracts, the chained address of filename and webpage.Wherein Multi-medium data includes: the subtitle or speech text of video finished product, key frame of video and video.Then according to multi-medium data Related information in webpage link address using MurmurHash algorithm calculate webpage link address cryptographic Hash, the cryptographic Hash As the corresponding mark of multi-medium data.
If text information and multi-medium data are text information and more matchmakers corresponding to same multimedia file Webpage link address in volume data is identical, be also by the calculated cryptographic Hash of webpage link address it is identical, just same more Text information and multi-medium data in media file are associated with.
Step 205, disappeared to text information and handled again.
Specifically, disappeared to handle again to text information and can be divided into following five steps execution, as shown in Figure 3.
Step 205a, judge in memory whether the corresponding mark of existing text information.If it exists, it thens follow the steps 205b thens follow the steps 205c if it does not exist
In the present embodiment, in user terminal starting, the corresponding mark of the text information recorded in physical file need to be first read Know, i.e., upload the text information corresponding mark of completion when load last user terminal starts, and these are uploaded to the mark of completion Knowledge is loaded onto memory.
The corresponding mark of text information is stored according to the issuing time of the file information in memory, to store in memory Issuing time be three days in the corresponding mark of text information for be illustrated.In memory, by issuing time in three days The corresponding mark of text information, the corresponding mark of text information is divided into 72 pieces and deposited by issuing time as unit of hour It puts, and periodically eliminates the corresponding mark of expired text information.As unit of hour, piecemeal stores the corresponding expression of text information Inquiry velocity can be improved.
This time judge in memory after starting user terminal with the presence or absence of the corresponding mark of text information updated.
In the present embodiment, Broome has been used when mark corresponding with the presence or absence of the text information updated in audit memory The algorithm of filter (Bloom filter), the basic thought of Bloom filter algorithm is: using the method for hash function, One element is mapped to a point on the array of a m length, when this point is 1, then this element is in set, It is on the contrary then not in set.The shortcomings that this method is exactly may have conflict, solution when there are many element of detection It is exactly to correspond to k point using k hash function, if all the points are all 1, that identical element element is in set, if there is 0 Words, element is not then in set.
Step 205b filters text information.
Step 205c judges whether the issuing time of text information is in preset time, if so, thening follow the steps 205d, if it is not, thening follow the steps 205e.
Step 205d by the corresponding identification record of text information into memory, and executes step 205f.
In the present embodiment, by the corresponding identification record of text information into memory, can for it is subsequent judge in memory whether Disappear containing text information and uses again.
Step 205e, judge in database whether the corresponding mark of existing text information, and if it exists, then follow the steps 205b thens follow the steps 204f if it does not exist.
Step 205f, by text information and its corresponding expression associated storage into database.
In the present embodiment, in the processing again that disappeared to text information, enter database in the text information of update first Before, first determine whether in memory whether the corresponding mark of existing text information because the multimedia file in internet carries out When update, the multimedia file typically issued in the recent period, and what is stored in memory be also issuing time is recent text The corresponding mark of information, since the metadata stored in Hbase database is largely, if directly in Hbase database The lookup for carrying out the file information, the pressure of inquiry is brought by elapsed time and to Hbase database.So when the text for having update This information will enter before database, and progress memory first disappears again, can effectively reduce the weight that directly carries out disappearing in Hbase database The pressure of bring inquiry.
In the present embodiment step 202- step 205, extracts the text information of multimedia file and text information is carried out Filtration treatment, calculates cryptographic Hash and the processing again that disappeared is with extraction multi-medium data and to multi-medium data progress cryptographic Hash Calculating process also can be carried out successively simultaneously, and the present embodiment is with no restriction.
Step 206, by text information and its corresponding mark associated storage into the database.
In the present embodiment, after the processing again that disappeared to text information, text information and its association of corresponding mark are deposited Store up in the database, the database be (Hadoop Database, abbreviation Hbase) database, every text message and its Corresponding mark forms a metadata.
Step 207, index is created for the text information in database.
It is that the text information in database creates index, as in the text information of each metadata in the present embodiment The title of multi-medium data, author, issuing time, affiliated web site, chained address, regional information, the creation such as multimedia abstract Index can make the subsequent retrieval to multimedia messages keyword more quickly and efficiently after text information creation index.
Step 208, judge in database with the presence or absence of the identical mark of mark corresponding with multi-medium data, and if it exists, 209 are thened follow the steps, if it does not exist, thens follow the steps 210.
Step 209, by multi-medium data and its corresponding mark associated storage into server, and by multi-medium data pair The mark answered and storage address associated storage in the server execute step 211 into database.
In the present embodiment, server refers to the servers such as Tomcat, when by multi-medium data storage into server, note Record the storage address of each multi-medium data in the server.Storage by the corresponding mark of multi-medium data and in the server Address information is stored into database, can be in the database according to multimedia number so that user is when checking multi-medium data Multi-medium data is obtained in the server according to corresponding mark and storage address.
Step 210, the corresponding mark of the multi-medium data is backed up, and executes step 208.
In the present embodiment, after being backed up the corresponding mark of the multi-medium data, whether deposited in judging database After the identical mark of mark corresponding with a upper multi-medium data, rejudging in database whether there is and the multimedia The identical mark of the corresponding mark of data.
Step 211, judge whether multi-medium data is video caption or speech text, if so, 212 are thened follow the steps, if It is no, then follow the steps 213.
Step 212, the field of video caption or speech text is added in the text information of corresponding mark, and is video The field of subtitle or speech text creation index.
In the present embodiment, the field of video caption or speech text is added in the text information of corresponding mark, and be The field of video caption or speech text creation index, user can carry out text information by voice, caption information crucial Word and search.
The step 101 of step 213 and the acquisition methods embodiment one of multi-medium data of the present invention is identical, does not go to live in the household of one's in-laws on getting married one by one herein It states.
Step 214, according to multimedia messages keyword, the metadata in database is retrieved, determining and multimedia The corresponding purpose mark of the text information that information key matches, the metadata include multimedia file text information and its Corresponding mark.
It further, due to being added to the field of video caption or speech text in text information, and is video caption Or the field of speech text creates index, so user can be according to the pass of video caption or speech text in multimedia messages Keyword retrieves the metadata in database, can retrieve more accurate and generalization text information.
Step 215, the text information to match to user's output with multimedia messages keyword.
Further, after the text information to match to user's output with multimedia messages keyword, time, net are provided It stands, the statistics of the various dimensions such as region, receives certain dimension of user's selection, count it to text information, output statistics As a result, user can be monitored and analyzed multi-medium data according to statistical result.
It illustrates are as follows: output 10000 to user and match with " Beijing room rate ", " five, state ", " house property tax " Text information, user's selection count this 10000 text message with time dimension, then to user can be exported each time The number curve of the video data of Duan Fabu enables users to preferably analyze " Beijing room rate ", " five, state ", " house property tax " mutual The distribution situation of the multi-medium data of networking.
The step 104- step 106 of step 216- step 218 and the acquisition methods embodiment one of multi-medium data of the present invention It is identical, it will not repeat them here.
In the present embodiment, the cryptographic Hash of the webpage link address by calculating separately text information and multi-medium data will The text information and multi-medium data of each multimedia file are associated, and can obtain more matchmakers according to the text information retrieved Volume data, and the field of video caption and speech text is added in text information, it is capable of providing more accurate and generalization Search result carry out memory to text information and disappear to handle again before by text information and corresponding mark storage to database, The pressure for the weight bring inquiry that directly carries out disappearing in Hbase database can be effectively reduced.By multi-medium data in server In storage address storage in the database, can be according to the text information of output purpose mark and storage address in the server The interested multi-medium data of user is quickly obtained, user is made to obtain preferably experience.
Fig. 4 is the structural schematic diagram of the acquisition device embodiment one of multi-medium data of the present invention, as shown in figure 4, more matchmakers The acquisition device of volume data includes: receiving module 401, retrieval module 402, determining module 403, output module 404 and obtains mould Block 405.Wherein, receiving module 401, for receiving the inquiry request of user's input, inquiry request includes multimedia messages key Word.Retrieval module 402, for being retrieved to the metadata in database according to multimedia messages keyword.Determining module 403, for determining purpose mark corresponding with the text information that multimedia messages keyword matches, metadata includes multimedia The text information of file and its corresponding mark.Output module 404 is used for user's output and multimedia messages keyword phase The text information matched.Receiving module 401 is also used to receive user and responds to the confirmation of text information.Module 405 is obtained, is used for Multi-medium data is obtained from server according to the purpose mark of text information corresponding with confirmation response.Output module 404, For multi-medium data to be exported to user.
The device of the present embodiment can execute the technical solution of embodiment of the method shown in Fig. 1, realization principle and technology effect Seemingly, details are not described herein again for fruit.
Fig. 5 is the structural schematic diagram of the acquisition device embodiment two of multi-medium data of the present invention, as shown in figure 5, more matchmakers The acquisition device of volume data includes: receiving module 501, retrieval module 502, determining module 503, output module 504, obtains module 505, acquisition module 506, extraction module 507, computing module 508, memory module 509, adding module 510, judgment module 511.
Wherein, receiving module 501, for receiving the inquiry request of user's input, inquiry request includes that multimedia messages close Keyword.Retrieval module 502, for being retrieved to the metadata in database according to multimedia messages keyword.Determining module 503, for determining purpose mark corresponding with the text information that multimedia messages keyword matches, metadata includes multimedia The text information of file and its corresponding mark.Output module 504 is used for user's output and multimedia messages keyword phase The text information matched.Receiving module 501 is also used to receive user and responds to the confirmation of text information.Module 505 is obtained, is used for Multi-medium data is obtained from server according to the purpose mark of text information corresponding with confirmation response.Output module 504, For multi-medium data to be exported to user.
Further, acquisition module 506, the inquiry request of user's input is received for receiving module, and inquiry request includes Before multimedia messages keyword, multimedia file is acquired.
Extraction module 507, for extracting the text information of multimedia file.
Computing module 508 carries out Hash calculation for the webpage link address to text information, obtained cryptographic Hash is made For the corresponding mark of text information.
Memory module 509 is used for text information and its corresponding mark associated storage into database.
Further, extraction module 507 are also used to after acquisition module acquisition multimedia file, extract multimedia file Multi-medium data.
Computing module 508, the webpage link address being also used in the related information to multi-medium data carry out Hash calculation, Obtained cryptographic Hash is as the corresponding mark of multi-medium data.
Memory module 509 will if being also used in database mark identical in the presence of mark corresponding with multi-medium data Multi-medium data and and its corresponding mark associated storage into server.
Memory module 509 is also used to for the corresponding mark of multi-medium data being associated with storage address in the server and deposit It stores up in database.
Preferably, adding module 510, if for identifying identical mark in the presence of corresponding with multi-medium data in database Know, then by multi-medium data and its corresponding mark associated storage into server after, if multi-medium data be video caption Or speech text, then the field of video caption or speech text is added in the text information of corresponding mark.
Further, judgment module 511, for memory module by text information and its corresponding mark associated storage to counting Before in library, judge in memory whether the corresponding mark of existing text information.
Judgment module 511, if being also used in memory there is no the corresponding mark of text information, judge in database whether The corresponding mark of existing text information.
Memory module 509, if being also used in database there is no the corresponding mark of text information, by text information and its Corresponding mark associated storage is into database.
The device of the present embodiment can execute the technical solution of embodiment of the method shown in Fig. 2 and Fig. 3, realization principle and skill Art effect is similar, and details are not described herein again.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (6)

1. a kind of acquisition methods of multi-medium data characterized by comprising
The inquiry request of user's input is received, the inquiry request includes multimedia messages keyword;
According to the multimedia messages keyword, the metadata in database is retrieved, the determining and multimedia messages The corresponding purpose mark of the text information that keyword matches, the metadata include the text information of multimedia file and its right The mark answered;
The text information to match to user's output with the multimedia messages keyword;
User is received to respond the confirmation of the text information;
Multi-medium data is obtained from server according to the purpose mark of text information corresponding with the confirmation response;
The multi-medium data is exported to user;
Before the inquiry request for receiving user's input, comprising: receive the configuration rule of user's input, generation and configuration rule Relevant multimedia messages keyword;
The inquiry request for receiving user's input, before the inquiry request includes multimedia messages keyword, further includes:
Acquire multimedia file;
Extract the text information of the multimedia file;
Hash calculation is carried out to the webpage link address of the text information, using obtained cryptographic Hash as the text information pair The mark answered;
By the text information and its corresponding mark associated storage into the database;
After the acquisition multimedia file, further includes:
Extract the multi-medium data of the multimedia file;
Hash calculation is carried out to the webpage link address in the related information of the multi-medium data, obtained cryptographic Hash is as institute State the corresponding mark of multi-medium data;
If in the presence of the identical mark of mark corresponding with the multi-medium data in the database, by the multi-medium data And its corresponding mark associated storage is into server;
By the corresponding mark of the multi-medium data and storage address associated storage in the server to the database In.
2. if the method according to claim 1, wherein exist and the multimedia number in the database According to the identical mark of corresponding mark, then by the multi-medium data and its corresponding mark associated storage into server it Afterwards, further includes:
If the multi-medium data is video caption or speech text, the field of the video caption or speech text is added Into the text information of corresponding mark.
3. -2 described in any item methods according to claim 1, which is characterized in that described by the text information and its corresponding Before associated storage is identified into the database, further includes:
Judge in memory whether the corresponding mark of the existing text information;
If the corresponding mark of the text information is not present in the memory, judge whether existing described in the database The corresponding mark of text information;
If the corresponding mark of the text information is not present in the database, by the text information and its corresponding mark Associated storage is into the database.
4. a kind of acquisition device of multi-medium data characterized by comprising
Receiving module, for receiving the inquiry request of user's input, the inquiry request includes multimedia messages keyword;
Retrieval module, for being retrieved to the metadata in database according to the multimedia messages keyword;
Determining module, for determining purpose mark corresponding with the text information that the multimedia messages keyword matches, institute State the text information and its corresponding mark that metadata includes multimedia file;
Output module, the text information for matching to user's output with the multimedia messages keyword;
The receiving module is also used to receive user and responds to the confirmation of the text information;
Module is obtained, it is more for being obtained from server according to the purpose mark of text information corresponding with the confirmation response Media data;
The output module, for exporting the multi-medium data to user;
The receiving module is also used to receive the configuration rule of user's input;
Described device further include: generation module, for generating multimedia messages keyword relevant to configuration rule;
Further include:
Acquisition module, the inquiry request of user's input is received for the receiving module, and the inquiry request includes multimedia letter Before ceasing keyword, multimedia file is acquired;
Extraction module, for extracting the text information of the multimedia file;
Computing module, for the text information webpage link address carry out Hash calculation, using obtained cryptographic Hash as The corresponding mark of the text information;
Memory module is used for the text information and its corresponding mark associated storage into the database;
The extraction module is also used to after the acquisition module acquisition multimedia file, extracts the more of the multimedia file Media data;
The computing module, the webpage link address being also used in the related information to the multi-medium data carry out Hash meter It calculates, obtained cryptographic Hash is as the corresponding mark of the multi-medium data;
The memory module, if being also used in the database mark identical in the presence of mark corresponding with the multi-medium data Know, then by the multi-medium data and and its corresponding mark associated storage into server;
The memory module is also used to close the corresponding mark of the multi-medium data and the storage address in the server Connection storage is into the database.
5. device according to claim 4, which is characterized in that further include:
Adding module, if for identifying identical mark in the presence of corresponding with the multi-medium data in the database, After then by the multi-medium data and its corresponding mark associated storage into server, if the multi-medium data is video The field of the video caption or speech text is then added in the text information of corresponding mark by subtitle or speech text.
6. according to the described in any item devices of claim 4-5, which is characterized in that further include:
Judgment module, for the memory module by the text information and its corresponding mark associated storage to the database In before, judge in memory whether the corresponding mark of the existing text information;
The judgment module, if being also used in the memory, there is no the corresponding marks of the text information, judge the number According in library whether the corresponding mark of the existing text information;
The memory module, if being also used in the database, there is no the corresponding marks of the text information, by the text This information and its corresponding mark associated storage are into the database.
CN201410163005.6A 2014-04-22 2014-04-22 The acquisition methods and device of multi-medium data Expired - Fee Related CN105095211B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410163005.6A CN105095211B (en) 2014-04-22 2014-04-22 The acquisition methods and device of multi-medium data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410163005.6A CN105095211B (en) 2014-04-22 2014-04-22 The acquisition methods and device of multi-medium data

Publications (2)

Publication Number Publication Date
CN105095211A CN105095211A (en) 2015-11-25
CN105095211B true CN105095211B (en) 2019-03-26

Family

ID=54575680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410163005.6A Expired - Fee Related CN105095211B (en) 2014-04-22 2014-04-22 The acquisition methods and device of multi-medium data

Country Status (1)

Country Link
CN (1) CN105095211B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956050B (en) * 2016-04-26 2019-07-23 珠海豹趣科技有限公司 A kind of method of data capture, device and equipment
CN114328996A (en) * 2016-05-12 2022-04-12 杭州网易云音乐科技有限公司 Method and device for publishing information
CN108205546B (en) * 2016-12-16 2021-01-12 北京酷我科技有限公司 Song information matching system and method
CN107038238A (en) * 2017-04-19 2017-08-11 深圳市茁壮网络股份有限公司 A kind of association of data, player method and device
CN109002447A (en) * 2017-06-07 2018-12-14 中兴通讯股份有限公司 A kind of information collection method for sorting and device
CN107770624B (en) * 2017-10-24 2021-03-05 中国移动通信集团公司 Method and device for playing multimedia file in live broadcast process and storage medium
CN108090139B (en) * 2017-11-30 2021-10-01 北京邮电大学 File retrieval method and device
CN108549711A (en) * 2018-04-20 2018-09-18 广东工业大学 A kind of method, apparatus, equipment and the storage medium of storage big data
CN108829765A (en) * 2018-05-29 2018-11-16 平安科技(深圳)有限公司 A kind of information query method, device, computer equipment and storage medium
CN108829817A (en) * 2018-06-12 2018-11-16 海南省火蓝数据有限公司 A kind of Content Management System melting media for big data
CN111367870A (en) * 2018-12-25 2020-07-03 深圳市优必选科技有限公司 Method, device and system for sharing picture book
CN109684492B (en) * 2018-12-28 2022-03-04 北京爱奇艺科技有限公司 Multimedia file searching method and device and electronic equipment
CN109561323B (en) * 2019-01-02 2021-11-12 武汉珈铭汉象教育科技有限公司 MP4 file encryption and decryption method and device
CN110147467A (en) * 2019-04-11 2019-08-20 北京达佳互联信息技术有限公司 A kind of generation method, device, mobile terminal and the storage medium of text description
CN110489653A (en) * 2019-08-23 2019-11-22 北京金堤科技有限公司 Public feelings information querying method and device, system, electronic equipment, storage medium
CN111159435B (en) * 2019-12-27 2023-09-05 新方正控股发展有限责任公司 Multimedia resource processing method, system, terminal and computer readable storage medium
CN111680072B (en) * 2020-05-07 2023-12-08 国家计算机网络与信息安全管理中心 System and method for dividing social information data
CN112052375B (en) 2020-09-30 2024-06-11 北京百度网讯科技有限公司 Public opinion acquisition and word viscosity model training method and device, server and medium
CN113626622B (en) * 2021-07-29 2024-07-19 网易有道信息技术(江苏)有限公司 Multimedia data display method in interactive teaching and related equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033923A (en) * 2010-12-14 2011-04-27 百度时代网络技术(北京)有限公司 Method and device for searching and displaying online videos

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100578503C (en) * 2007-03-26 2010-01-06 徐礼岗 Quality and on-line number P2P video frequency search method
CN101620608A (en) * 2008-07-04 2010-01-06 全国组织机构代码管理中心 Information collection method and system
CN102014081B (en) * 2010-12-23 2013-04-03 汉王科技股份有限公司 Method, device and system for playing song segment in instant communication system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033923A (en) * 2010-12-14 2011-04-27 百度时代网络技术(北京)有限公司 Method and device for searching and displaying online videos

Also Published As

Publication number Publication date
CN105095211A (en) 2015-11-25

Similar Documents

Publication Publication Date Title
CN105095211B (en) The acquisition methods and device of multi-medium data
WO2016074492A1 (en) Social platform-based data mining method and device
US20100268776A1 (en) System and Method for Determining Information Reliability
US20140172415A1 (en) Apparatus, system, and method of providing sentiment analysis result based on text
US20150154249A1 (en) Data ingestion module for event detection and increased situational awareness
CN103970891B (en) A kind of user interest information querying method based on situation
CN104601672B (en) The method and apparatus of network resource sharing based on different application client
WO2019109698A1 (en) Method and apparatus for determining target user group
CN105518644A (en) Method for processing and displaying real-time social data on map
CN105573995A (en) Interest identification method, interest identification equipment and data analysis method
CN113268649B (en) Thread monitoring method and system based on diversified data fusion
CN111949702B (en) Abnormal transaction data identification method, device and equipment
TW201516938A (en) User information classification method and apparatus, and user group information acquisition method and apparatus
CN106033438B (en) Public sentiment data storage method and server
CN112231700B (en) Behavior recognition method and apparatus, storage medium, and electronic device
US20170140301A1 (en) Identifying social business characteristic user
CN104615627A (en) Event public sentiment information extracting method and system based on micro-blog platform
CN112085087A (en) Method and device for generating business rules, computer equipment and storage medium
CN106682206A (en) Method and system for big data processing
CN110008462A (en) A kind of command sequence detection method and command sequence processing method
JP2014153977A (en) Content analysis device, content analysis method, content analysis program, and content reproduction system
Kim et al. TwitterTrends: a spatio-temporal trend detection and related keywords recommendation scheme
CN117493671A (en) Information processing method, information processing device, electronic equipment and computer storage medium
CN112667869B (en) Data processing method, device, system and storage medium
CN107220262B (en) Information processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220627

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: Peking University

Patentee after: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

Address before: 100871, Beijing, Haidian District, Cheng Fu Road, No. 298, Zhongguancun Fangzheng building, 9 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: Peking University

Patentee before: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190326