WO2007147359A1 - Système et procédé permettant de rectifier les informations d'un fichier multimédia - Google Patents

Système et procédé permettant de rectifier les informations d'un fichier multimédia Download PDF

Info

Publication number
WO2007147359A1
WO2007147359A1 PCT/CN2007/070114 CN2007070114W WO2007147359A1 WO 2007147359 A1 WO2007147359 A1 WO 2007147359A1 CN 2007070114 W CN2007070114 W CN 2007070114W WO 2007147359 A1 WO2007147359 A1 WO 2007147359A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
multimedia file
name
corrected
module
Prior art date
Application number
PCT/CN2007/070114
Other languages
English (en)
Chinese (zh)
Inventor
Xiangxin Yu
Ying Xiong
Zhiyuan Liu
Original Assignee
Tencent Technology (Shenzhen) Company Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology (Shenzhen) Company Limited filed Critical Tencent Technology (Shenzhen) Company Limited
Publication of WO2007147359A1 publication Critical patent/WO2007147359A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • the present invention relates to the technical field of correcting multimedia file information, in particular, a multimedia file information correction system and a multimedia file information correction method. Background of the invention
  • FIG. 1 is a schematic diagram showing the results obtained by searching using the above prior art. As shown in FIG. 1, the following problems exist:
  • the "dancing mother” listed in the song name should be the album name, not part of the song name;
  • "Love Entertainment" in the song name is the name of the website that provides the song, not the name of the song, and belongs to the unwanted advertisement text;
  • the song name is not completely correct, and this entry lacks the artist name and album name;
  • Another method is to use the music website to crawl the music file information, and then use the manually maintained template to extract the music file information on the page. Although the method can obtain more accurate information, the method uses manual maintenance.
  • the template to extract the music file information on the page, and different templates usually need to maintain different templates for different websites, and the workload is very large, which leads to low efficiency and can only collect music file information on fewer websites. Summary of the invention
  • the embodiment of the invention provides a multimedia file information correction system, which aims to effectively provide more accurate multimedia file information.
  • the embodiment of the invention also proposes a multimedia file information correction method for effectively providing relatively accurate multimedia file information.
  • an embodiment of the present invention provides a multimedia file information correction system, where the system includes an information storage module and an information correction module, where:
  • the information saving module is configured to save the determined information of the multimedia file
  • the information correction module is configured to search for the determined information related to the multimedia file to be corrected in the information saving module, and replace the to-be-corrected multimedia file information with the found determined information.
  • the embodiment of the invention further provides a multimedia file information correction method, the method comprising: A. checking the determined information of the multimedia file according to the information of the multimedia file to be corrected Finding the determined information related to the information of the multimedia file to be corrected;
  • the present invention utilizes the determined information of the multimedia file, the related determined information is found according to the multimedia file to be corrected, and then the corrected modified multimedia file is replaced with the found determined information.
  • the information of the embodiment of the present invention is more efficient, so that the implementation of the above technical solution does not depend on the manually maintained template.
  • the technical solution of the embodiment of the present invention can be applied to various aspects such as correcting music file information of a search engine, and correcting multimedia file information in a device such as a local computer.
  • FIG. 1 is a schematic diagram of music file information obtained by searching using the prior art
  • FIG. 2 is a schematic structural diagram of a multimedia file information correction system according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method for modifying multimedia file information according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a multimedia file information correction system according to an embodiment of the present invention. As shown in FIG. 2, the method includes at least an information storage module and an information modification module, and may further include a word segmentation module, a filtering module, an output module, and the like.
  • the multimedia file information correction system of the embodiment of the present invention can be used in various aspects, such as correcting information of a multimedia file such as a music file searched by an existing search engine, correcting information of a multimedia file in a device such as a local computer, and More for other occasions
  • the information of the media file is corrected.
  • the information of the music file searched by the search engine is modified as an example. Those skilled in the art can replace the music file with other multimedia files and replace the music file information with the information corresponding to other multimedia files. .
  • the multimedia file information correction system is further connected with a multimedia file search system that utilizes existing search technology from the Internet device and / or the local device obtains multimedia file information, for example, crawling the Internet with a crawler, extracting a link related to the music file, saving corresponding text information such as an anchor text, a title, a tag, and the like as the music to be corrected.
  • File information provided to the multimedia file information correction system.
  • the music file information includes a song name, an artist name, an album name, and the like.
  • the multimedia file search system is prior art, it is not mentioned here.
  • the information saving module stores the determined information of the music file, that is, the music file information such as the determined song name, the song name corresponding to the song, and the album name.
  • Table 1 shows an implementation of the music file information in the information saving module: Table 1 No. Song name Artist name Album name
  • the music information may be saved in another manner, for example: storing information of all the albums, including the song name and corresponding singer name in the album; or storing information of all the singers, including the singer name, singing The song name and the album name corresponding to the song. It is also possible not to save the correspondence between the above song name, artist name, and album name, but to save a number of song names, artist names, and album names.
  • the information saving module can be a single module or a plurality of saving modules. That is, the song information can be saved in a save module or saved in multiple modules.
  • the word segmentation module in FIG. 2 is used for word segmentation processing of music file information searched by the multimedia file search system, that is, word segmentation processing of anchor text, web page title and tag content saved in text form.
  • the word segmentation of the above word segmentation module is divided into two cases:
  • the word segmentation module uses the space and / or punctuation as a separator to segment the text, such as "Jay Chou - Nocturne” will be divided into “Jay Chou” and "Nocturne”.
  • the second case is that the anchor text, the page title, and the tag content are separated by no separators, such as "Jay Chou nocturne.” Because there is no separator, so use the separator above.
  • the word segmentation method cannot implement the word segmentation.
  • the embodiment of the present invention uses the song name, the artist name, and the album name in the information storage module as a dictionary, and uses, for example, a reverse maximum matching method, a forward maximum matching method, a statistical-based word segmentation method, and the like.
  • the word segmentation method is used for word segmentation.
  • multiple word segmentation methods can be combined to implement word segmentation.
  • the forward maximum matching method and the inverse maximum matching method are combined to form a two-way matching method to implement word segmentation.
  • singer names, song names, album names, etc. are all proper nouns, which are less ambiguous.
  • word segmentation can achieve better results. For example, “Jay Chou nocturne” can be correctly divided into “ Jay Chou “and “nocturne” two words.
  • the page title and the tag content saved in the text form are processed by word segmentation, the anchor text, the page title and the tag content in the music file information to be corrected are formally converted into one or a group of words, as in the above example. It was converted into the words "Jay Chou” and "Nocturne".
  • the filtering module shown in Fig. 2 is used for filtering processing the music files to be corrected, and removing unnecessary information such as advertisements, fraudulent information and the like which have no meaning to the user.
  • the filtering module can directly filter the information of the multimedia file to be corrected without word segmentation, or filter the information of the multimedia file to be corrected after the word segmentation processing, and then filter the information of the multimedia file to be corrected after the word segmentation processing into The example is explained.
  • Filtering conditions are stored in the filtering module.
  • the filtering module filters unnecessary information such as advertisements or spoofing information according to the filtering conditions. If the filtering conditions are found, the related items in the multimedia file information to be corrected are removed.
  • the filtering condition mainly includes two parts: an action and a comparison value, wherein the action may be include, the same, the length is greater than, the length is less than, and the like, and the contrast value may be various information.
  • the filtering conditions in this embodiment are, for example, "including WWW”, "including XX” (XX is some pornographic vocabulary). If one or more information in the multimedia file information to be corrected satisfies the filtering condition, the filtering module removes related information in the multimedia file information to be corrected. For example, to be repaired If the multimedia file information includes "WWW" or "XX", etc., the filtering module removes the corresponding information.
  • the filtering module in the present invention is also used to filter such information according to other filtering conditions, and the processing flow is as follows:
  • the ratio between the two is judged. If it exceeds a certain threshold (for example, 50%), it can be determined as fraudulent information, and the corresponding entry in the multimedia file information to be corrected is removed.
  • a certain threshold for example, 50%
  • the fraud information, the advertisement information, and other information that is not required by the user are basically filtered, and a set of filtered information is obtained.
  • the filtering condition is saved in the filtering module.
  • the filtering condition may also be saved in the information saving module, and the filtering module is connected to the information saving module for filtering.
  • the filter module can call the filter condition to filter.
  • the filtering module filters out the information of the partial search items for the anchor text, the page title and the tag content of the word segmentation, a set of filtered information is obtained.
  • the information correction module shown in FIG. 2 is configured to search for the determined information related to the multimedia file to be corrected in the information saving module, and replace the to-be-corrected one with the found determined information.
  • the multimedia file information to be corrected here may be the original multimedia file information without word segmentation and filtering, and may be multimedia information that is only subjected to word segmentation or only filtering, and may be multimedia file information after word segmentation and filtering. The following is an example of the multimedia file information after segmentation and filtering.
  • the song name determining step sorting the word segmentation and the filtered information in the order of the anchor text, the title, and the tag, and then matching the song names in the information saving module to find whether there is an exact matching song name, if any,
  • the first matching song name is used as the song name of the music file, otherwise the song name with the highest similarity above the similarity criterion is used as the song name of the music file. That is to say, the found song name, artist name, and album name in the embodiment of the present invention may be exactly matched, or may be the highest similarity.
  • the similarity is defined as: the ratio of the number of characters of the two information S1 and S2 to the average length of S1 and S2, where S1 and S2 are the two pieces of information.
  • S1 and S2 are the two pieces of information.
  • the similarity between "ABC” and “BCA” is 100%, while the similarity between "ABC” and “BCD” is 67%, and the similarity between "ABC” and “BA” is 80%.
  • the criterion for similarity should be set to an appropriate value, such as 70%. If the similarity is less than 70%, it cannot be used as the song name.
  • the singer name determining step after determining the song name, if the singer name corresponding to the song name in the music information saving module is unique, the singer name can be determined at the same time. If it corresponds to multiple singer names, it means that this is a song of the same name that has been sung by many people. Therefore, the word segmentation and filtered information are sequentially performed in the order of anchor text, title, and tag. Match the search to see if there is an exact match of the artist name. If there is, the first matching artist name will be the singer name of the music file, otherwise the highest similarity song will be judged above the standard.
  • the name of the hand is the name of the artist of the music file. If it is not found, leave the name of the artist name blank.
  • the album name determining step after determining the song name and the artist name, if the album name corresponding to the song name in the music information saving module is unique, the album name can be determined. If there are multiple album names, the word segmentation and the filtered words are matched and matched with the album name corresponding to the song name in the order of the anchor text, the title, and the tag, to see if there is an exact matching album name, and if so, A matching album name is used as the album name of the music file, otherwise the album name with the highest similarity above the judging criterion is used as the album name of the music file, and if not found, the album name is left blank.
  • the output module shown in Figure 2 is used to output the multimedia file information after the information correction module is modified.
  • the information stored in the information saving module may also be a correspondence between a hash code of the multimedia file and the determined information of the multimedia file. Then, the information correction module can search for the corresponding determined information according to the hash code of the multimedia file to be corrected as the determined information related to the multimedia file to be corrected, and replace the information to be corrected with the determined information, thereby realizing the modification of the multimedia file information.
  • the data saving format in the information saving module can be as shown in Table 2: Table 2
  • the file hash code can be a 32-bit or 64-bit integer. Commonly used hash algorithms include Cyclic Redundancy Check 32 (CRC32), Message Digest Algorithm 5 (MD5), and so on. If the two files F1 and F2 are equal in hash code calculated by the same algorithm, the contents of F1 and F2 can be considered to be completely equal.
  • CRC32 Cyclic Redundancy Check 32
  • MD5 Message Digest Algorithm 5
  • the music file information correction method mainly includes the following steps: Step S1, using the existing search technology to acquire a music file to be modified from the Internet, step S2, and performing word segmentation processing on the modified multimedia file information, and the specific word segmentation processing is as follows. As mentioned before, it will not be traced here.
  • step S3 the multimedia file information to be corrected after the word segmentation is filtered to remove unnecessary information.
  • Step S4 the ⁇ participle word, the filtered multimedia file information to be corrected, find the determined information related to the to-be-corrected multimedia file, and replace the to-be-corrected multimedia file information with the found determined information.
  • the step may further include: if the determined song name is found, replacing the song name of the music file to be corrected with the found song name; if the found song name has a unique corresponding artist name , the name of the singer to be corrected is replaced by the singer name, otherwise the singer name of the music file to be corrected is replaced with the name of the singer found; if the name of the song and/or the singer name found has a unique corresponding album name, then The album name replaces the music file album name to be corrected, otherwise the album name of the music file to be corrected is replaced with the found album name.
  • the searched song name, artist name, and album name described here are exactly matched.
  • Step S5 Output the corrected multimedia file information, and further display the information to the user.
  • various inaccurate multimedia file information can be effectively corrected by using the determined multimedia file information to obtain accurate information of the multimedia file.
  • the multimedia file modification system and method of the embodiments of the present invention are also highly efficient because no tools such as manual templates are required.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne un système et un procédé permettant de rectifier les informations du fichier multimédia, le système comprenant un module de sauvegarde des informations et un module de rectification des informations, où le module de sauvegarde des informations est utilisé pour sauvegarder les informations déterminées du fichier multimédia; où le module de rectification des informations est utilisé pour chercher les informations déterminées relatives au fichier multimédia à rectifier dans le module de sauvegarde des informations, et pour remplacer les informations du fichier multimédia par les informations déterminées qui ont été cherchées. Le procédé permettant de rectifier les informations du fichier multimédia comprend les étapes consistant à : A) rechercher les informations déterminées relatives aux informations du fichier multimédia à rectifier dans les informations déterminées du fichier multimédia sur la base des informations du fichier multimédia à rectifier; B) remplacer les informations du fichier multimédia à rectifier par les informations déterminées qui ont été cherchées.
PCT/CN2007/070114 2006-06-15 2007-06-14 Système et procédé permettant de rectifier les informations d'un fichier multimédia WO2007147359A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2006100611886A CN101071422B (zh) 2006-06-15 2006-06-15 一种音乐文件搜索处理系统及方法
CN200610061188.6 2006-06-15

Publications (1)

Publication Number Publication Date
WO2007147359A1 true WO2007147359A1 (fr) 2007-12-27

Family

ID=38833082

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2007/070114 WO2007147359A1 (fr) 2006-06-15 2007-06-14 Système et procédé permettant de rectifier les informations d'un fichier multimédia

Country Status (2)

Country Link
CN (1) CN101071422B (fr)
WO (1) WO2007147359A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827074A (zh) * 2019-10-31 2020-02-21 夏振宇 采用视频语音分析进行广告投放评估的方法

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207941A (zh) * 2010-03-29 2011-10-05 上海博泰悦臻电子设备制造有限公司 车载音乐的提供、获取方法和装置以及车载音乐传输系统
CN102289439B (zh) * 2010-06-18 2014-10-22 上海博泰悦臻电子设备制造有限公司 音乐文件提供方法及其提供系统
CN102289440A (zh) * 2010-06-18 2011-12-21 上海博泰悦臻电子设备制造有限公司 音乐文件提供方法及其提供系统
CN102929874A (zh) * 2011-08-08 2013-02-13 深圳市快播科技有限公司 检索数据的排序方法及装置
CN103150595B (zh) * 2011-12-06 2016-03-09 腾讯科技(深圳)有限公司 数据处理系统中的自动配对选择方法和装置
CN102662957B (zh) * 2012-03-02 2015-02-18 百度在线网络技术(北京)有限公司 用于优化浏览器的搜索结果页面的装置及方法
CN103049578A (zh) * 2013-01-15 2013-04-17 深圳市宜搜科技发展有限公司 一种获取歌曲信息的方法及系统
CN104484379B (zh) * 2014-12-09 2018-06-12 百度在线网络技术(北京)有限公司 确定音乐实体关系的方法和装置及查询处理方法和装置
CN105808627A (zh) * 2014-12-31 2016-07-27 高德软件有限公司 Poi信息更新、检索、poi数据包生成方法及装置
CN105608129B (zh) * 2015-12-16 2019-04-26 北京奇虎科技有限公司 文件清理方法、装置及系统、移动终端
WO2018094689A1 (fr) * 2016-11-25 2018-05-31 深圳前海达闼云端智能科技有限公司 Procédé, appareil et dispositif permettant d'améliorer l'expérience de navigation
JP6483825B2 (ja) * 2016-12-09 2019-03-13 グーグル エルエルシー 自動変化形検出を用いる禁止されたネットワークコンテンツの配信の防止
CN108304367B (zh) * 2017-04-07 2021-11-26 腾讯科技(深圳)有限公司 分词方法及装置
CN108319635A (zh) * 2017-12-15 2018-07-24 海南智媒云图科技股份有限公司 一种多平台音乐资源整合播放的方法、电子设备及存储介质
CN110717062B (zh) * 2018-07-11 2024-03-22 斑马智行网络(香港)有限公司 音乐搜索及车载音乐播放方法、装置、设备以及存储介质
CN109543064B (zh) * 2018-11-30 2020-12-18 北京微播视界科技有限公司 歌词显示处理方法、装置、电子设备及计算机存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1501395A (zh) * 1995-07-26 2004-06-02 更新小型光盘变换器及记录介质播放器中的存储器的方法
US20050065912A1 (en) * 2003-09-02 2005-03-24 Digital Networks North America, Inc. Digital media system with request-based merging of metadata from multiple databases
JP2005309712A (ja) * 2004-04-21 2005-11-04 Sharp Corp 楽曲検索システムおよび楽曲検索方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4189758B2 (ja) * 2004-06-30 2008-12-03 ソニー株式会社 コンテンツ記憶装置、コンテンツ記憶方法、コンテンツ記憶プログラム、コンテンツ転送装置、コンテンツ転送プログラム及びコンテンツ転送記憶システム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1501395A (zh) * 1995-07-26 2004-06-02 更新小型光盘变换器及记录介质播放器中的存储器的方法
US20050065912A1 (en) * 2003-09-02 2005-03-24 Digital Networks North America, Inc. Digital media system with request-based merging of metadata from multiple databases
JP2005309712A (ja) * 2004-04-21 2005-11-04 Sharp Corp 楽曲検索システムおよび楽曲検索方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827074A (zh) * 2019-10-31 2020-02-21 夏振宇 采用视频语音分析进行广告投放评估的方法

Also Published As

Publication number Publication date
CN101071422A (zh) 2007-11-14
CN101071422B (zh) 2010-10-13

Similar Documents

Publication Publication Date Title
WO2007147359A1 (fr) Système et procédé permettant de rectifier les informations d'un fichier multimédia
Haveliwala et al. Scalable Techniques for Clustering the Web.
US11487816B2 (en) Facilitating video search
US6978419B1 (en) Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments
US7730316B1 (en) Method for document fingerprinting
US7917480B2 (en) Document compression system and method for use with tokenspace repository
US20070250501A1 (en) Search result delivery engine
US7792877B2 (en) Scalable minimal perfect hashing
US20120166414A1 (en) Systems and methods for relevance scoring
US20100228711A1 (en) Enterprise Search Method and System
US20130086096A1 (en) Method and System for High Performance Pattern Indexing
JP5616444B2 (ja) 文書インデックス化およびデータクエリングのための方法およびシステム
JP2014041615A (ja) コプロセッサを使った高性能のデータメタタグ付けおよびデータ索引付けの方法およびシステム
US8316041B1 (en) Generation and processing of numerical identifiers
WO2008098502A1 (fr) Procédé et dispositif destinés à créer un index et procédé et système de récupération
Liu et al. Information retrieval and Web search
US20220147526A1 (en) Keyword and business tag extraction
US20100161615A1 (en) Index anaysis apparatus and method and index search apparatus and method
EP3926484A1 (fr) Recherche floue améliorée utilisant des voisinages de suppression au niveau des champs
CN103136212A (zh) 一种类别新词的挖掘方法及装置
US8682900B2 (en) System, method and computer program product for documents retrieval
US20070239735A1 (en) Systems and methods for predicting if a query is a name
CN105574004A (zh) 一种网页去重方法和设备
CN111091003A (zh) 一种基于知识图谱查询的并行抽取方法
JP2004220176A (ja) データベース検索システム、その検索方法及び検索に用いられるデータファイルの作成方法並びにデータファイルを格納した記録媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07721735

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS EPO FORM 1205A DATED 04.05.2009.

122 Ep: pct application non-entry in european phase

Ref document number: 07721735

Country of ref document: EP

Kind code of ref document: A1