CN103455642A - Method and device for multi-media file retrieval - Google Patents

Method and device for multi-media file retrieval Download PDF

Info

Publication number
CN103455642A
CN103455642A CN2013104694873A CN201310469487A CN103455642A CN 103455642 A CN103455642 A CN 103455642A CN 2013104694873 A CN2013104694873 A CN 2013104694873A CN 201310469487 A CN201310469487 A CN 201310469487A CN 103455642 A CN103455642 A CN 103455642A
Authority
CN
China
Prior art keywords
multimedia file
keyword
phonetic feature
description
sounder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013104694873A
Other languages
Chinese (zh)
Other versions
CN103455642B (en
Inventor
胡锴亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics China R&D Center, Samsung Electronics Co Ltd filed Critical Samsung Electronics China R&D Center
Priority to CN201310469487.3A priority Critical patent/CN103455642B/en
Publication of CN103455642A publication Critical patent/CN103455642A/en
Application granted granted Critical
Publication of CN103455642B publication Critical patent/CN103455642B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for multi-media file retrieval. The method comprises the steps that when a voice command is received, motion keywords and description keywords in the voice command are separated from the voice command; the separated description keywords are used for being matched with a stored multi-media file with description voice data embedded, and if the description keywords are matched with the multi-media file, corresponding motion is conducted on the multi-media file confirmed to be in need of retrieval according to the separated motion keywords. Based on the same conception, the invention further provides a device capable of conducting multi-media file retrieval simply through voice and bringing friendly user experience.

Description

A kind of method and apparatus of multimedia file retrieval
Technical field
The application relates to the voice processing technology field, particularly a kind of method and apparatus of multimedia file retrieval.
Background technology
At present, the multimedia recording module, as camera, microphone has become the standard configuration of personal daily electronic equipment, as camera, mobile phone, panel computer etc.People use these equipment records Life intravenous drips more and more, have also preserved a large amount of photos, video, the multimedia files such as audio frequency.
These files are generally all named with digital number or shooting time, but the name of this sequence number mode makes document retrieval, management considerably inconvenient just.
For example the user wants to retrieve a few photos, generally need to browse a large amount of photos and therefrom select required several, particularly, on the terminal that there is no the universal input devices such as mouse-keyboard, for example on TV, retrieving photo, can only utilize simple telepilot to browse the retrieval photo, quite inconvenient.
In existing the realization, existing some technical scheme, for solving the multimedia file search problem, is all by setting up the mapping relations of keyword and multimedia file basically, and these mapping relations is preserved with unique file or the mode of data deposits database in.
The disadvantage of these solutions is that the relation of the file that is retrieved and mapping relations file is comparatively loose, and can produce a large amount of mapped files, and for the user, it is more mixed and disorderly that file seems; If file is renamed if be retrieved in addition, may need to upgrade mapped file or database simultaneously; When the file that simultaneously will be retrieved dumps on other memory device or display device, need the mapped file of unloading simultaneously, or database file.
Summary of the invention
In view of this, the application provides a kind of method and apparatus of multimedia file retrieval, can carry out simply the retrieval of multimedia file by voice, brings friendly user's health check-up.
For solving the problems of the technologies described above, technical scheme of the present invention is achieved in that
A kind of method of multimedia file retrieval, described method comprises:
Storing multimedia, wherein, described multimedia file, when catching, has embedded the description speech data for this multimedia file;
While receiving the phonetic order of retrieving multimedia file, identify and separate the description keyword of the multimedia file that action keyword in this phonetic order and needs retrieve;
According to isolated description keyword, the multimedia file of storing is mated; Wherein, while mating each multimedia file, by embedding the description speech data of this multimedia file, identify the description keyword of this multimedia file; If isolated description keyword and the description keyword coupling identified, determine the multimedia file of this multimedia file for coupling;
According to isolated action keyword, the multimedia file matched is carried out to corresponding action.
A kind of device, this device comprises: storage unit, receiving element, recognition unit, matching unit and processing unit;
Described storage unit, for storing multimedia, wherein, described multimedia file, when catching, has embedded the description speech data for this multimedia file;
Described receiving element, for receiving the phonetic order of retrieving multimedia file;
Described recognition unit, for when described receiving element receives the phonetic order of retrieving multimedia file, identify and separate the description keyword of the multimedia file that action keyword in this phonetic order and needs retrieve;
Described matching unit, for according to the isolated description keyword of described recognition unit, mated the multimedia file of described cell stores; Wherein, while mating each multimedia file, by embedding the description speech data of this multimedia file, identify the description keyword of this multimedia file; If isolated description keyword and the description keyword coupling identified, determine the multimedia file of this multimedia file for coupling;
Described processing unit, for according to the isolated action keyword of described recognition unit, the multimedia file that described matching unit is matched is carried out corresponding action.
In sum, the application, by when receiving phonetic order, isolates the action keyword in this phonetic order and describes keyword; Use the embedding of isolated description keyword coupling storage to describe the multimedia file of speech data, if match multimedia file, according to isolated action keyword, the multimedia file that is defined as retrieving is carried out to corresponding action.In the situation that do not need to preserve and safeguard the mapping relations of keyword and multimedia file, can carry out by speech recognition the retrieval of multimedia file.
The accompanying drawing explanation
The method flow schematic diagram that Fig. 1 is retrieving multimedia file in the embodiment of the present invention one;
The method flow schematic diagram that Fig. 2 is retrieving multimedia file in the embodiment of the present invention two;
The method flow schematic diagram that Fig. 3 is retrieving multimedia file in the embodiment of the present invention three;
The method flow schematic diagram that Fig. 4 is retrieving multimedia file in the embodiment of the present invention four;
Fig. 5 is that in the embodiment of the present invention five, the multimedia file copyright is determined the method flow schematic diagram;
Fig. 6 is the apparatus structure schematic diagram that is applied to above-mentioned technology.
Embodiment
For making purpose of the present invention, technical scheme and advantage clearer, referring to the accompanying drawing embodiment that develops simultaneously, scheme of the present invention is described in further detail.
Propose a kind of method of multimedia file retrieval in the embodiment of the present invention, when receiving phonetic order, isolate the action keyword in this phonetic order and describe keyword; Use the embedding of isolated description keyword coupling storage to describe the multimedia file of speech data, if match multimedia file, according to isolated action keyword, the multimedia file that is defined as retrieving is carried out to corresponding action.Can carry out simply the retrieval of multimedia file by voice, bring friendly user's health check-up.
In the specific embodiment of the invention, catch the equipment of multimedia file, with the equipment of retrieving multimedia file can be same equipment, can be also distinct device.If not same equipment, the multimedia file that embeds speech data need be dumped on the equipment of retrieving multimedia file.
The device storage multimedia file of retrieving multimedia file, wherein, this multimedia file, when catching, has embedded the description speech data for this multimedia file.
When if this equipment has the function of obtaining multimedia file, this equipment is directly stored the multimedia file obtained; When if this equipment does not have the function of catching multimedia file, the multimedia file of catching on the equipment of this multimedia file is dumped on this equipment.
Wherein, for the description speech data of this multimedia file, can be the sounder of the catching description speech data for this multimedia file.
As when taking a pictures, the Direct Acquisition sounder is for the description voice of this pictures, and embeds in this pictures, the content of describing speech data as: this is the picture of taking in Shanghai.Can also finer description, as specified place etc.
Description speech data for this multimedia file can be also default default speech data.
Default default voice are described voice for a section of catching in advance, suppose that this content of describing speech data is similarly: this is the picture of taking in Shanghai.The picture of taking in Shanghai like this can embed this section default default speech data, can not need all to catch the description speech data at every turn.
Multimedia file is when catching, embedded the description speech data for this multimedia file, comprise: will be for the description speech data of this multimedia file, with growth data, metadata, digital watermarking, or the form of other reservation speech data unprocessed forms, be embedded in the multimedia file captured.
Wherein, in the mode of growth data, embed, be about to describe speech data and embed as the prolate-headed mode of multimedia file; Mode with metadata embeds, and is about to describe speech data and embeds as the mode of the auxiliary data of multimedia file; Mode with digital watermarking embeds, and is about to describe speech data and embeds in multimedia file in the mode of digital watermarking.
The mode of the description voice embedding data in the specific embodiment of the invention, do not need as in existing the realization mapping relations of the description keyword that independently preservation description keyword, and multimedia file is corresponding with it.Multimedia file is being carried out to unloading, or while revising the operation such as title, the mapping relations of the description keyword that does not need to safeguard that this multimedia file is corresponding with it.Do not affect the retrieval of multimedia file.
Embodiment mono-
Referring to Fig. 1, the method flow schematic diagram that Fig. 1 is retrieving multimedia file in the embodiment of the present invention one.Concrete steps are:
Step 101, when equipment receives the phonetic order of retrieving multimedia file, identify and separate the description keyword of the multimedia file that action keyword in this phonetic order and needs retrieve.
When this equipment receives phonetic order, by speech recognition engine or the program of this equipment, or the speech recognition engine provided by other equipment or program, this phonetic order identified, and the keyword identified is separated, be separated into and describe keyword and action keyword.
The content of supposing the phonetic order that receives is " photo that shows Shanghai ", isolates the action keyword and is: " demonstration "; The description keyword is: " Shanghai ".
For how to identify with separate phonetic order in keyword, by the existing voice recognition technology, all can realize, no longer specifically provide implementation method here.
Step 102, this equipment, according to isolated description keyword, is mated the multimedia file of storing.
Wherein, when this equipment mates each multimedia file, by embedding the description speech data of this multimedia file, identify the description keyword of this multimedia file; If isolated description keyword and this description keyword coupling identified, determine the multimedia file of this multimedia file for coupling; If isolated description keyword and the crucial word mismatch of the description identified, determine the not multimedia file for mating of this multimedia file.
If the description keyword that the description speech recognition embedded in certain multimedia file is gone out is " Shanghai ", determine the multimedia file of this multimedia file for coupling; If the description keyword that the description speech recognition embedded in certain multimedia file is gone out is " Beijing ", determine that this multimedia file is unmatched multimedia file.
Step 103, this equipment, according to isolated action keyword, is carried out corresponding action to the multimedia file matched.
As above given an example, when isolated action keyword is " demonstration ", the corresponding actions of carrying out is to show the multimedia file matched.
Embodiment bis-
Referring to Fig. 2, the method flow schematic diagram that Fig. 2 is retrieving multimedia file in the embodiment of the present invention two.Concrete steps are:
Step 201, when equipment receives the phonetic order of retrieving multimedia file, identify and separate the description keyword of the multimedia file that action keyword in this phonetic order and needs retrieve.
Step 202, this equipment determines that whether isolated description keyword mates with the default keyword of describing, and if so, performs step 203; Otherwise, perform step 205.
If the content of the phonetic order received is " showing the photo that I take ", identification isolated action keyword are " demonstration "; Describe keyword and be " photo that I clap ", certain, description keyword during specific implementation can be also " I " etc., and specific implementation specifically arranges.
At this moment isolated description keyword and concrete people interrelate, and need to perform step 203, with the phonetic feature of sounder, mate.
The default keyword of describing, be the description keyword relevant to sounder self, as: I, photo that I clap etc.
Step 203, the phonetic order that this equipment interconnection is received carries out phonetic feature identification, obtains the phonetic feature of the sounder of this phonetic order.
The specific implementation of this step, can extract the phonetic feature of sounder according to biometric technology, no longer specifically describes here.
Step 204, this equipment is used the phonetic feature of the sounder obtained, and the multimedia file of storing is mated, and performs step 206.
When this equipment mates each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained mates with the phonetic feature of the speech data identified, determine the multimedia file of this multimedia file for coupling; If the phonetic feature of sending out the survivor obtained does not mate with the phonetic feature identified, determine that this multimedia file is not the multimedia file of coupling.
Step 205, this equipment, according to isolated description keyword, is mated the multimedia file of storing.
Wherein, when this equipment mates each multimedia file, by embedding the description speech data of this multimedia file, identify the description keyword of this multimedia file; If isolated description keyword and this description keyword coupling identified, determine the multimedia file of this multimedia file for coupling; If isolated description keyword and the crucial word mismatch of the description identified, determine the not multimedia file for mating of this multimedia file.
Step 206, this equipment, according to isolated action keyword, is carried out corresponding action to the multimedia file of coupling.
In the present embodiment, when the description keyword of isolated description keyword and configuration coupling, when isolated description keyword is associated with sounder self, the phonetic feature by the phonetic order that receives mates multimedia file.
Embodiment tri-
Referring to Fig. 3, the method flow schematic diagram that Fig. 3 is retrieving multimedia file in the specific embodiment of the invention three.Concrete steps are:
Step 301, equipment receives the phonetic order of retrieving multimedia file, identifies and separate the description keyword of the multimedia file that action keyword in this phonetic order and needs retrieve; And the phonetic order received is carried out to phonetic feature identification, obtain the phonetic feature of the sounder of this phonetic order.
In the present embodiment, when receiving phonetic order, can directly to the phonetic order received, carry out phonetic feature identification, obtain the phonetic feature of the sounder of this phonetic order, also can be when using phonetic feature identification, obtain again phonetic feature, perform step 303 and identify again phonetic feature before.
Step 302, this equipment, according to isolated description keyword, is mated the multimedia file of storing.
Wherein, wherein, when this equipment mates each multimedia file, by embedding the description speech data of this multimedia file, identify the description keyword of this multimedia file; If isolated description keyword and this description keyword coupling identified, determine the multimedia file of this multimedia file for coupling; If isolated description keyword and the crucial word mismatch of the description identified, determine the not multimedia file for mating of this multimedia file.。
Step 303, this equipment is used the phonetic feature of the sounder obtained, and the multimedia file matched by isolated description keyword is mated again.
After using isolated description keyword retrieval to finish, some and the multimedia file of describing the keyword coupling have been obtained.Carry out Secondary Match in the multimedia file that this step is used the phonetic feature of the sounder obtained to match at these.
Wherein, when this equipment mates each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained mates with the phonetic feature of the speech data identified, determine the multimedia file of this multimedia file for coupling; Otherwise, determine that the multimedia file corresponding to phonetic feature of the speech data that this identifies is not the multimedia file of coupling.
Step 304, this equipment, according to isolated action keyword, is carried out corresponding action to the multimedia file matched by phonetic feature.
To describe keyword in the present embodiment and phonetic feature is incorporated into line retrieval, what finally be equivalent to show is the multimedia files after two factors retrievals are occured simultaneously, and can improve like this accuracy of retrieval.During this embodiment specific implementation, first by describing keyword retrieval, then retrieve by phonetic feature.
Embodiment tetra-
Referring to Fig. 4, the method flow schematic diagram that Fig. 4 is retrieving multimedia file in the embodiment of the present invention four.Concrete steps are:
Step 401, when equipment receives the phonetic order of retrieving multimedia file, identify and separate the description keyword of the multimedia file that action keyword in this phonetic order and needs retrieve; And the phonetic order received is carried out to phonetic feature identification, obtain the phonetic feature of the sounder of this phonetic order.
Step 402, this equipment is used the phonetic feature of the sounder obtained, and the multimedia file of storing is mated.
Wherein, when this equipment mates each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained mates with the phonetic feature of the speech data identified, determine the multimedia file of this multimedia file for coupling; Otherwise, determine that the multimedia file corresponding to phonetic feature of the speech data that this identifies is not the multimedia file of coupling.
Step 403, this equipment is according to isolated description keyword, and the multimedia file that the phonetic feature of the sounder by obtaining is matched is mated again.
Step 404, this equipment, according to isolated action keyword, is carried out corresponding action to the multimedia file matched by isolated keyword.
In the present embodiment, be that description keyword and phonetic feature are incorporated into to line retrieval equally, what finally be equivalent to show is the multimedia file after two factor retrievals are occured simultaneously, and can improve like this accuracy of retrieval.During this embodiment specific implementation, first by phonetic feature, retrieve, then by describing keyword retrieval.
Embodiment five
Referring to Fig. 5, Fig. 5 is that in the embodiment of the present invention five, the multimedia file copyright is determined the method flow schematic diagram.Concrete steps are:
Step 501, when equipment receives the speech data of sounder, carry out phonetic feature identification, obtains the phonetic feature of this sounder.
Step 502, this equipment determines that to needs the description speech data embedded in the multimedia file of copyright carries out phonetic feature identification, obtains the phonetic feature of describing speech data.
Step 503, the phonetic feature of the phonetic feature of the definite sounder obtained of this equipment and the description speech data of acquisition is mated.
Step 504, if the phonetic feature of the description speech data of the phonetic feature of the sounder obtained and acquisition coupling is determined the copyright owner that this sounder is this multimedia file; Otherwise, determine that this sounder is not the copyright owner of this multimedia file.
Utilize consistance biologically, multimedia file is carried out to copyright protection.
Inventive concept based on same, the application also proposes a kind of device.Referring to Fig. 6, Fig. 6 is the apparatus structure schematic diagram that is applied to above-mentioned technology.This device comprises: storage unit 601, receiving element 602, recognition unit 603, matching unit 604 and processing unit 605.
Storage unit 601, for storing multimedia, wherein, described multimedia file, when catching, has embedded the description speech data for this multimedia file.
Receiving element 602, for receiving the phonetic order of retrieving multimedia file.
Recognition unit 603, for when receiving element 602 receives the phonetic order of retrieving multimedia file, identify and separate the description keyword of the multimedia file that action keyword in this phonetic order and needs retrieve.
Matching unit 604, for according to the isolated description keyword of recognition unit 603, mated the multimedia file of storage unit 601 storages; Wherein, while mating each multimedia file, by embedding the description speech data of this multimedia file, identify the description keyword of this multimedia file; If isolated description keyword and the description keyword coupling identified, determine the multimedia file of this multimedia file for coupling.
Processing unit 604, for according to the isolated action keyword of recognition unit 603, the multimedia file that matching unit 604 is matched is carried out corresponding action.
Preferably, the described speech data of the description for this multimedia file is: default default speech data, or the sounder of catching is for the description speech data of this multimedia file.
Preferably,
Storage unit 601, the multimedia file of storage, will be for the description speech data of this multimedia file when catching, and with growth data, metadata, digital watermarking, or retains the form of data unprocessed form, is embedded in the multimedia file captured.
Preferably,
Processing unit 604, be further used for determining whether the isolated keyword of recognition unit 603 mates with predetermined keyword.
Recognition unit 603, determine isolated description keyword and the default keyword coupling of describing if be further used for processing unit 604, and the phonetic order received is carried out to phonetic feature identification, obtains the phonetic feature of the sounder of this phonetic order.
Matching unit 604, the phonetic feature of the sounder that is further used for using recognition unit 603 to obtain, mated the multimedia file of storage; Wherein, while mating each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained and the phonetic feature identified coupling, determine the multimedia file of this multimedia file for coupling; And trigger processing unit 604 according to isolated action keyword, the multimedia file matched is carried out to corresponding action; Determine the isolated description keyword of recognition unit 603 and the crucial word mismatch of default description when processing unit 604, according to isolated description keyword, the multimedia file of storing is mated.
Preferably,
Recognition unit 603, the phonetic order be further used for receiving carries out phonetic feature identification, obtains the phonetic feature of the sounder of this phonetic order.
Matching unit 604, be further used for according to isolated description keyword, after the multimedia file of storing is mated, uses the phonetic feature of the sounder obtained, and the multimedia file matched by the description keyword is further mated; Wherein, while mating each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained mates with the phonetic feature of the speech data identified, determine the multimedia file of this multimedia file for coupling.
Processing unit 604, be further used for according to isolated action keyword, and the multimedia file that matching unit 604 is matched by phonetic feature is carried out corresponding action.
Preferably,
Recognition unit 603, the phonetic order be further used for receiving carries out phonetic feature identification, obtains the phonetic feature of the sounder of this phonetic order.
Matching unit 604, be further used for the phonetic feature of the sounder that use to obtain, after the multimedia file of storage is mated; Wherein, while mating each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained mates with the phonetic feature of the speech data identified, determine the multimedia file of this multimedia file for coupling; And, according to isolated keyword, in the multimedia file matched by phonetic feature, mated.
Processing unit 604, be further used for according to isolated action keyword, the corresponding action of multimedia file execution that matching unit 604 is matched by describing keyword.
Preferably,
Receiving element 602, be further used for receiving the speech data of sounder.
Recognition unit 603, when being further used for receiving element 602 and receiving the speech data of sounder, carry out phonetic feature identification, obtains the phonetic feature of this sounder; Needs are determined to the description speech data embedded in the multimedia file of copyright carries out phonetic feature identification, obtain the phonetic feature of describing speech data.
Matching unit 604, the phonetic feature that is further used for the description speech data of the phonetic feature of the sounder that determine to obtain and acquisition is mated.
Processing unit 604, if be further used for the phonetic feature coupling that matching unit 604 is determined phonetic feature with the description speech data of acquisition of the sounders that obtain, the copyright owner that definite this sounder is this multimedia file; Otherwise, determine that this sounder is not the copyright owner of this multimedia file.
In sum, when receiving phonetic order, isolate the action keyword in this phonetic order and describe keyword in the specific embodiment of the invention; Use the embedding of isolated description keyword coupling storage to describe the multimedia file of speech data, if match multimedia file, according to isolated action keyword, the multimedia file that is defined as retrieving is carried out to corresponding action.In the situation that do not need to preserve and safeguard the mapping relations of keyword and multimedia file, can carry out simply the retrieval of multimedia file by voice, bring friendly user's health check-up.
In specific embodiment, give by the phonetic feature of sounder and retrieved, and by the embodiment of phonetic feature and description keyword combined retrieval, can be more convenient, retrieve more accurately the multimedia file that needs retrieval.
Simultaneously, also provide the identification by phonetic feature, determined the copyright owner's of multimedia file embodiment, can be simple, effectively protect the copyright of multimedia file.
The above, be only preferred embodiment of the present invention, is not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (14)

1. the method for a multimedia file retrieval, is characterized in that, described method comprises:
Storing multimedia, wherein, described multimedia file, when catching, has embedded the description speech data for this multimedia file;
While receiving the phonetic order of retrieving multimedia file, identify and separate the description keyword of the multimedia file that action keyword in this phonetic order and needs retrieve;
According to isolated description keyword, the multimedia file of storing is mated; Wherein, while mating each multimedia file, by embedding the description speech data of this multimedia file, identify the description keyword of this multimedia file; If isolated description keyword and the description keyword coupling identified, determine the multimedia file of this multimedia file for coupling;
According to isolated action keyword, the multimedia file matched is carried out to corresponding action.
2. method according to claim 1, is characterized in that,
The described speech data of the description for this multimedia file is: default default speech data, or the sounder of catching is for the description speech data of this multimedia file.
3. method according to claim 1, it is characterized in that, described multimedia file is when catching, embedded the description speech data for this multimedia file, comprise: will be for the description speech data of this multimedia file, with growth data, metadata, digital watermarking, or the form of reservation data unprocessed form, be embedded in the multimedia file captured.
4. according to the described method of claim 1-3 any one, it is characterized in that, the description keyword of the action keyword in this phonetic order of described separation and the multimedia file that needs retrieval, afterwards, described according to isolated description keyword, multimedia file to storage is mated, and before, described method further comprises:
Determine whether isolated description keyword mates with the default keyword of describing, and if so, the phonetic order received is carried out to phonetic feature identification, obtains the phonetic feature of the sounder of this phonetic order; Use the phonetic feature of the sounder obtained, the multimedia file of storing is mated; Wherein, while mating each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained and the phonetic feature identified coupling, determine the multimedia file of this multimedia file for coupling; And, according to isolated action keyword, the multimedia file matched is carried out to corresponding action;
Otherwise, carry out described according to isolated description keyword, to the multimedia file step and the subsequent step that are mated of storage.
5. according to the described method of claim 1-3, it is characterized in that, action keyword in this phonetic order of described separation and while needing the description keyword of multimedia file of retrieval, described method further comprises: the phonetic order received is carried out to phonetic feature identification, obtain the phonetic feature of the sounder of this phonetic order;
Described according to isolated description keyword, the multimedia file of storing is mated, afterwards, described according to isolated action keyword, the multimedia file matched is carried out to corresponding action, before, described method further comprises:
Use the phonetic feature of the sounder obtained, the multimedia file matched by the description keyword is further mated; Wherein, while mating each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained mates with the phonetic feature of the speech data identified, determine the multimedia file of this multimedia file for coupling;
Described according to isolated action keyword, the multimedia file matched is carried out to corresponding action, comprising: according to isolated action keyword, the multimedia file matched by phonetic feature is carried out to corresponding action.
6. according to the described method of claim 1-3, it is characterized in that, action keyword in this phonetic order of described separation and while needing the description keyword of multimedia file of retrieval, described method further comprises: the phonetic order received is carried out to phonetic feature identification, obtain the phonetic feature of the sounder of this phonetic order;
Described according to isolated description keyword, the multimedia file of storing is mated, before, described method further comprises: use the phonetic feature of the sounder obtained, the multimedia file of storing is mated; Wherein, while mating each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained mates with the phonetic feature of the speech data identified, determine the multimedia file of this multimedia file for coupling;
Wherein, according to isolated description keyword, the multimedia file of storing is mated, being comprised: according to isolated keyword, mated in the multimedia file matched by phonetic feature.
7. according to the described method of claim 1-3 any one, it is characterized in that, described method further comprises:
While receiving the speech data of sounder, carry out phonetic feature identification, obtain the phonetic feature of this sounder;
Needs are determined to the description speech data embedded in the multimedia file of copyright carries out phonetic feature identification, obtain the phonetic feature of describing speech data;
The phonetic feature of the phonetic feature of definite sounder obtained and the description speech data of acquisition is mated;
If the phonetic feature of the phonetic feature of the sounder obtained and the description speech data of acquisition coupling, determine the copyright owner that this sounder is this multimedia file; Otherwise, determine that this sounder is not the copyright owner of this multimedia file.
8. a device, is characterized in that, this device comprises: storage unit, receiving element, recognition unit, matching unit and processing unit;
Described storage unit, for storing multimedia, wherein, described multimedia file, when catching, has embedded the description speech data for this multimedia file;
Described receiving element, for receiving the phonetic order of retrieving multimedia file;
Described recognition unit, for when described receiving element receives the phonetic order of retrieving multimedia file, identify and separate the description keyword of the multimedia file that action keyword in this phonetic order and needs retrieve;
Described matching unit, for according to the isolated description keyword of described recognition unit, mated the multimedia file of described cell stores; Wherein, while mating each multimedia file, by embedding the description speech data of this multimedia file, identify the description keyword of this multimedia file; If isolated description keyword and the description keyword coupling identified, determine the multimedia file of this multimedia file for coupling;
Described processing unit, for according to the isolated action keyword of described recognition unit, the multimedia file that described matching unit is matched is carried out corresponding action.
9. device according to claim 8, is characterized in that, the described speech data of the description for this multimedia file is: default default speech data, or the sounder of catching is for the description speech data of this multimedia file.
10. device according to claim 8, is characterized in that,
Described storage unit, the multimedia file of storage, will be for the description speech data of this multimedia file when catching, and with growth data, metadata, digital watermarking, or retains the form of data unprocessed form, is embedded in the multimedia file captured.
11. the described device of according to Claim 8-10 any one, is characterized in that,
Described processing unit, be further used for determining whether the isolated keyword of described recognition unit mates with predetermined keyword;
Described recognition unit, determine isolated description keyword and the default keyword coupling of describing if be further used for described processing unit, and the phonetic order received is carried out to phonetic feature identification, obtains the phonetic feature of the sounder of this phonetic order;
Described matching unit, the phonetic feature of the sounder that is further used for using described recognition unit to obtain, mated the multimedia file of storage; Wherein, while mating each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained and the phonetic feature identified coupling, determine the multimedia file of this multimedia file for coupling; And trigger described processing unit according to isolated action keyword, the multimedia file matched is carried out to corresponding action; Determine the isolated description keyword of described recognition unit and the crucial word mismatch of default description when described processing unit, according to isolated description keyword, the multimedia file of storing is mated.
12. the described device of according to Claim 8-10 any one, is characterized in that,
Described recognition unit, the phonetic order be further used for receiving carries out phonetic feature identification, obtains the phonetic feature of the sounder of this phonetic order;
Described matching unit, be further used for according to isolated description keyword, after the multimedia file of storing is mated, uses the phonetic feature of the sounder obtained, and the multimedia file matched by the description keyword is further mated; Wherein, while mating each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained mates with the phonetic feature of the speech data identified, determine the multimedia file of this multimedia file for coupling;
Described processing unit, be further used for according to isolated action keyword, and the multimedia file that described matching unit is matched by phonetic feature is carried out corresponding action.
13. the described device of according to Claim 8-10 any one, is characterized in that,
Described recognition unit, the phonetic order be further used for receiving carries out phonetic feature identification, obtains the phonetic feature of the sounder of this phonetic order;
Described matching unit, be further used for the phonetic feature of the sounder that use to obtain, after the multimedia file of storage is mated; Wherein, while mating each multimedia file, identify the phonetic feature of the description speech data that embeds this multimedia file; If the phonetic feature of the sounder obtained mates with the phonetic feature of the speech data identified, determine the multimedia file of this multimedia file for coupling; And, according to isolated keyword, in the multimedia file matched by phonetic feature, mated;
Described processing unit, be further used for according to isolated action keyword, the corresponding action of multimedia file execution that described matching unit is matched by describing keyword.
14. the described device of according to Claim 8-10 any one, is characterized in that,
Described receiving element, be further used for receiving the speech data of sounder;
Described recognition unit, when being further used for described receiving element and receiving the speech data of sounder, carry out phonetic feature identification, obtains the phonetic feature of this sounder; Needs are determined to the description speech data embedded in the multimedia file of copyright carries out phonetic feature identification, obtain the phonetic feature of describing speech data;
Described matching unit, the phonetic feature that is further used for the description speech data of the phonetic feature of the sounder that determine to obtain and acquisition is mated;
Described processing unit, the phonetic feature of the sounder that described matching unit determine to obtain if be further used for and the phonetic feature coupling of the description speech data of acquisition, the copyright owner that definite this sounder is this multimedia file; Otherwise, determine that this sounder is not the copyright owner of this multimedia file.
CN201310469487.3A 2013-10-10 2013-10-10 A kind of method and apparatus of multimedia document retrieval Active CN103455642B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310469487.3A CN103455642B (en) 2013-10-10 2013-10-10 A kind of method and apparatus of multimedia document retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310469487.3A CN103455642B (en) 2013-10-10 2013-10-10 A kind of method and apparatus of multimedia document retrieval

Publications (2)

Publication Number Publication Date
CN103455642A true CN103455642A (en) 2013-12-18
CN103455642B CN103455642B (en) 2017-03-08

Family

ID=49738005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310469487.3A Active CN103455642B (en) 2013-10-10 2013-10-10 A kind of method and apparatus of multimedia document retrieval

Country Status (1)

Country Link
CN (1) CN103455642B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761261A (en) * 2013-12-31 2014-04-30 北京紫冬锐意语音科技有限公司 Voice recognition based media search method and device
CN104462307A (en) * 2014-11-28 2015-03-25 深圳市中兴移动通信有限公司 Searching method and device for object in terminal
CN104660819A (en) * 2015-02-27 2015-05-27 北京京东尚科信息技术有限公司 Mobile equipment and method for accessing file in mobile equipment
CN106657537A (en) * 2016-12-07 2017-05-10 努比亚技术有限公司 Terminal voice search call recording device and method
CN107909871A (en) * 2017-12-26 2018-04-13 安徽声讯信息技术有限公司 A kind of tablet computer of intelligent sound teaching
CN109471953A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of speech data retrieval method and terminal device
CN113190647A (en) * 2021-04-15 2021-07-30 北京小米移动软件有限公司 Media file playing method, media file playing device and storage medium
US11379698B2 (en) 2017-09-22 2022-07-05 Huawei Technologies Co., Ltd. Sensor data processing method and apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1014258A2 (en) * 1998-12-23 2000-06-28 Hewlett-Packard Company Automatic data routing via voice command annotation
CN101916266A (en) * 2010-07-30 2010-12-15 优视科技有限公司 Voice control web page browsing method and device based on mobile terminal
CN102708185A (en) * 2012-05-11 2012-10-03 广东欧珀移动通信有限公司 Picture voice searching method
CN103218454A (en) * 2013-05-06 2013-07-24 百度在线网络技术(北京)有限公司 Voice-data-based file searching method, voice-data-based file device and voice-data-based file system
CN103280217A (en) * 2013-05-02 2013-09-04 锤子科技(北京)有限公司 Voice identification method and device of mobile terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1014258A2 (en) * 1998-12-23 2000-06-28 Hewlett-Packard Company Automatic data routing via voice command annotation
CN101916266A (en) * 2010-07-30 2010-12-15 优视科技有限公司 Voice control web page browsing method and device based on mobile terminal
CN102708185A (en) * 2012-05-11 2012-10-03 广东欧珀移动通信有限公司 Picture voice searching method
CN103280217A (en) * 2013-05-02 2013-09-04 锤子科技(北京)有限公司 Voice identification method and device of mobile terminal
CN103218454A (en) * 2013-05-06 2013-07-24 百度在线网络技术(北京)有限公司 Voice-data-based file searching method, voice-data-based file device and voice-data-based file system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761261A (en) * 2013-12-31 2014-04-30 北京紫冬锐意语音科技有限公司 Voice recognition based media search method and device
CN103761261B (en) * 2013-12-31 2017-07-28 北京紫冬锐意语音科技有限公司 A kind of media search method and device based on speech recognition
CN104462307A (en) * 2014-11-28 2015-03-25 深圳市中兴移动通信有限公司 Searching method and device for object in terminal
CN104660819A (en) * 2015-02-27 2015-05-27 北京京东尚科信息技术有限公司 Mobile equipment and method for accessing file in mobile equipment
CN106657537A (en) * 2016-12-07 2017-05-10 努比亚技术有限公司 Terminal voice search call recording device and method
US11379698B2 (en) 2017-09-22 2022-07-05 Huawei Technologies Co., Ltd. Sensor data processing method and apparatus
CN107909871A (en) * 2017-12-26 2018-04-13 安徽声讯信息技术有限公司 A kind of tablet computer of intelligent sound teaching
CN109471953A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of speech data retrieval method and terminal device
CN113190647A (en) * 2021-04-15 2021-07-30 北京小米移动软件有限公司 Media file playing method, media file playing device and storage medium

Also Published As

Publication number Publication date
CN103455642B (en) 2017-03-08

Similar Documents

Publication Publication Date Title
CN103455642A (en) Method and device for multi-media file retrieval
US7574453B2 (en) System and method for enabling search and retrieval operations to be performed for data items and records using data obtained from associated voice files
US7831598B2 (en) Data recording and reproducing apparatus and method of generating metadata
US9317531B2 (en) Autocaptioning of images
US8874596B2 (en) Image processing system and method
US9049388B2 (en) Methods and systems for annotating images based on special events
US8374466B2 (en) Methods of storing image files
US10432684B2 (en) Processing files from a mobile device
US8856121B1 (en) Event based metadata synthesis
US20050192808A1 (en) Use of speech recognition for identification and classification of images in a camera-equipped mobile handset
US20090265165A1 (en) Automatic meta-data tagging pictures and video records
US9652659B2 (en) Mobile device, image reproducing device and server for providing relevant information about image captured by image reproducing device, and method thereof
US20070250526A1 (en) Using speech to text functionality to create specific user generated content metadata for digital content files (eg images) during capture, review, and/or playback process
CN103226575A (en) Image processing method and device
WO2017067485A1 (en) Picture management method and device, and terminal
CN103064972A (en) Method and device for image search for mobile terminals
CN101751404A (en) Classification method of multimedia files
KR102503329B1 (en) Image classification method and electronic device
US8230344B2 (en) Multimedia presentation creation
US20150371629A9 (en) System and method for enabling search and retrieval operations to be performed for data items and records using data obtained from associated voice files
CN104331342A (en) Method for file path matching and the device thereof
CN106095805A (en) Image deletion method and system
US10885095B2 (en) Personalized criteria-based media organization
CN103136264A (en) Accessory inquiring method and user terminal
CN107885827A (en) File acquisition method, device, storage medium and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant