CN109271533A - A kind of multimedia document retrieval method - Google Patents
A kind of multimedia document retrieval method Download PDFInfo
- Publication number
- CN109271533A CN109271533A CN201811117840.0A CN201811117840A CN109271533A CN 109271533 A CN109271533 A CN 109271533A CN 201811117840 A CN201811117840 A CN 201811117840A CN 109271533 A CN109271533 A CN 109271533A
- Authority
- CN
- China
- Prior art keywords
- reference picture
- image
- characteristic point
- instruction
- speech retrieval
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present embodiments relate to intelligent security guard technical fields, disclose a kind of multimedia document retrieval method.The multimedia document retrieval method is applied to Network Attached Storage equipment, the Network Attached Storage equipment is used for storing multimedia, the described method includes: by receiving speech retrieval instruction, it is instructed according to the speech retrieval, determine reference picture, according to the reference picture, the target image of the speech retrieval instruction is determined for compliance with from the multimedia file, intercept the video clip of the preset time period including the target image, it is stored in destination folder, wherein, when a video clip includes two or more target images, the interval time of the two neighboring target image is less than preset time threshold.The present invention realizes automatic, efficient and high accuracy rate multimedia document retrieval.
Description
Technical field
The present embodiments relate to intelligent security guard technical field more particularly to a kind of multimedia document retrieval methods.
Background technique
Intelligent security guard technology refers to the transimission and storage technology of the informationization of service, image, with technology of Internet of things
Popularization and application, so that the security protection in city is developed from past simple security protection system to city integratedization system.
Multimedia file is the important information source of intelligent security guard, the intelligence in the region or a system regions to be realized
Energy security protection, for example, to analyze the vehicle flow information of certain a road section, vehicle position information, communal facility security information, meteorological letter
Breath etc., it is necessary to acquire or store in real time a large amount of multimedia document informations.It is huge due to calculating data, and have to data higher
Security requirement, usually a large amount of multimedia document information is stored in Network Attached Storage equipment.
It realizes in process of the present invention, at least there are the following problems in the related technology for inventor's discovery: currently, when needing from net
It is searched in network annex storage equipment and meets all video clips of a certain condition, for example, it is desired in the more of awards ceremony
When searching all video clips including a certain star in media video file, need by playing the multimedia video from the beginning
Frequency file needs the Manual interception video clip, takes time and effort, thereby increases and it is possible to deposit when observing the video frame including a certain star
The case where omitting.
Summary of the invention
The embodiment of the present invention provides a kind of automatic, efficient and high accuracy rate multimedia document retrieval method.
In order to solve the above-mentioned technical problem, the embodiment of the invention discloses following technical solutions:
In a first aspect, being applied to network attached storage the embodiment of the invention provides a kind of multimedia document retrieval method
Device equipment, the Network Attached Storage equipment are used for storing multimedia, which comprises
Receive speech retrieval instruction;
It is instructed according to the speech retrieval, determines reference picture;
According to the reference picture, the target figure of the speech retrieval instruction is determined for compliance with from the multimedia file
Picture;
The video clip for intercepting the preset time period including the target image, is stored in destination folder,
In, when a video clip includes two or more target images, between the two neighboring target image
It is less than preset time threshold every the time.
Optionally, the method also includes:
Receive range of search instruction;
It is instructed according to the range of search, determines the multimedia file to be retrieved.
Optionally, the reception speech retrieval, which instructs, includes:
Voice messaging acquisition is carried out by voice capture device;
Identify whether the voice messaging is default language;
If so, converting text information for the voice messaging is sent to the Network Attached Storage equipment;
If it is not, then convert default language for the voice messaging, and it is converted into text information to be sent to the network attached
Belong to memory devices.
Optionally, described to be instructed according to the speech retrieval, determine reference picture, comprising:
The speech retrieval instruction is parsed, determines the keyword of the speech retrieval instruction;
According to the keyword, associated images are obtained from internet or local data base;
From the associated images, reference picture is determined.
Optionally, the parsing speech retrieval instruction determines the keyword of the speech retrieval instruction, comprising:
Text information is converted by the voice messaging, is classified to the text information;
By the text information after classification processing it is for statistical analysis after, determine the keyword of speech retrieval instruction.
Optionally, described from the associated images, determine reference picture, comprising:
Receive user operation instruction;
According to the operational order, the reference picture is determined;
Alternatively,
The associated images are subjected to priority ranking by reference frequency, image definition or renewal time;
According to the priority, reference picture is determined.
Optionally, described according to the reference picture, it is determined for compliance with the speech retrieval from the multimedia file and refers to
The target image of order, comprising:
Identify the reference picture characteristic point of the reference picture;
The multimedia file is split as picture frame;
Judge whether the reference picture characteristic point matches with the image characteristic point of each described image frame;
According to the judging result, the image characteristic point of the reference picture characteristic point and each described image frame is counted
Number of matches;
According to the number of matches, the confidence level of image is determined;
According to the confidence level, it is determined for compliance with the target image of the speech retrieval instruction.
Optionally, described according to the judging result, count the reference picture characteristic point and each described image frame
Image characteristic point number of matches, comprising:
If the reference picture characteristic point does not match with the image characteristic point of each described image frame, continue to judge next
Whether a reference picture characteristic point matches with the image characteristic point of each described image frame;
If the Image Feature Point Matching of the reference picture characteristic point and each described image frame, count described with reference to figure
As the number of matches of characteristic point and the image characteristic point of each described image frame.
Optionally, described according to the confidence level, it is determined for compliance with the target image of the speech retrieval instruction, comprising:
Judge whether the confidence level is higher than default confidence threshold;
If so, determining that the corresponding image of described image frame is the target image for meeting the speech retrieval instruction.
Optionally, the method also includes:
Shearing or merging treatment are carried out to the video clip;
By treated, the video clip generates corresponding video link.
Second aspect, the embodiment of the invention provides a kind of multimedia document retrieval devices, are applied to network attached storage
Device equipment, the Network Attached Storage equipment are used for storing multimedia, and described device includes:
First receiving unit, for receiving speech retrieval instruction;
First determination unit instructs according to the speech retrieval for determining, determines reference picture;
Second determination unit, for being determined for compliance with the voice from the multimedia file according to the reference picture
The target image of search instruction;
Interception unit is stored in mesh for intercepting the video clip of the preset time period including the target image
It marks in file, wherein two neighboring described when a video clip includes two or more target images
The interval time of target image is less than preset time threshold.
Optionally, described device further include:
Second receiving unit, for receiving range of search instruction;
Third determination unit instructs according to the range of search, determines the multimedia file to be retrieved.
Optionally, first receiving unit is specifically used for:
Voice messaging acquisition is carried out by voice capture device;
Identify whether the voice messaging is default language;
If so, converting text information for the voice messaging is sent to the Network Attached Storage equipment;
If it is not, then convert default language for the voice messaging, and it is converted into text information to be sent to the network attached
Belong to memory devices.
Optionally, first determination unit is specifically used for:
The speech retrieval instruction is parsed, determines the keyword of the speech retrieval instruction;
According to the keyword, associated images are obtained from internet or local data base;
From the associated images, reference picture is determined.
Optionally, second determination unit is specifically used for:
Identify the reference picture characteristic point of the reference picture;
The multimedia file is split as picture frame;
Judge whether the reference picture characteristic point matches with the image characteristic point of each described image frame;
According to the judging result, the image characteristic point of the reference picture characteristic point and each described image frame is counted
Number of matches;
According to the number of matches, the confidence level of image is determined;
According to the confidence level, it is determined for compliance with the target image of the speech retrieval instruction.
Optionally, described device further include:
Processing unit, for carrying out shearing or merging treatment to the video clip;
Generation unit, for the video clip to generate corresponding video link by treated.
The third aspect, the embodiment of the invention provides a kind of Network Attached Storage equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
A processor executes, so that at least one described processor is able to carry out multimedia document retrieval method as described above.
Fourth aspect is described non-easy the embodiment of the invention also provides a kind of non-volatile computer readable storage medium storing program for executing
The property lost computer-readable recording medium storage has computer executable instructions, and the computer executable instructions are for making computer
Execute multimedia document retrieval method as described above.
The beneficial effect of the embodiment of the present invention is: being in contrast to the prior art, the embodiment of the invention provides one kind
Multimedia document retrieval method.By receiving speech retrieval instruction, is instructed according to the speech retrieval, determine reference picture, root
According to the reference picture, the target image of the speech retrieval instruction is determined for compliance with from the multimedia file, interception includes
The video clip of preset time period including the target image, is stored in destination folder, wherein when a video
When segment includes two or more target images, the interval time of the two neighboring target image is less than preset time
Threshold value, to realize automatic, efficient and high accuracy rate multimedia document retrieval.
Detailed description of the invention
One or more embodiments are illustrated by the image in corresponding attached drawing, these exemplary theorys
The bright restriction not constituted to embodiment, the element in attached drawing with same reference numbers label are expressed as similar element, remove
Non- to have special statement, composition does not limit the figure in attached drawing.
Fig. 1 is the schematic network structure of multimedia document retrieval method provided in an embodiment of the present invention;
Fig. 2 is the structural schematic diagram of terminal device 10 in Fig. 1;
Fig. 3 is a kind of storage region schematic diagram of Network Attached Storage equipment provided in an embodiment of the present invention;
Fig. 4 is a kind of flow diagram of multimedia document retrieval method provided in an embodiment of the present invention;
Fig. 5 is the flow diagram of step S11 in Fig. 4;
Fig. 6 is the flow diagram of step S12 in Fig. 4;
Fig. 7 is the flow diagram of step S121 in Fig. 6;
Fig. 8 is the flow diagram of step S123 in Fig. 6;
Fig. 9 is another flow diagram of step S123 in Fig. 6;
Figure 10 is another flow diagram of step S13 in Fig. 4;
Figure 11 is another flow diagram of step S134 in Figure 10;
Figure 12 is another flow diagram of step S136 in Figure 10;
Figure 13 is a kind of application schematic diagram of multimedia document retrieval method provided in an embodiment of the present invention;
Figure 14 be another embodiment of the present invention provides a kind of multimedia document retrieval method flow diagram;
Figure 15 is a kind of flow diagram for multimedia document retrieval method that further embodiment of this invention provides;
Figure 16 is a kind of structural schematic diagram of multimedia document retrieval device provided in an embodiment of the present invention;
Figure 17 be another embodiment of the present invention provides a kind of multimedia document retrieval device structural schematic diagram;
Figure 18 is a kind of structural schematic diagram for multimedia document retrieval device that further embodiment of this invention provides;
Figure 19 is a kind of structural schematic diagram of Network Attached Storage equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
In addition, as long as technical characteristic involved in the various embodiments of the present invention described below is each other not
Constituting conflict can be combined with each other.
Referring to Fig. 1, Fig. 1 is the schematic network structure of multimedia document retrieval method provided in an embodiment of the present invention.
As shown in Figure 1, the network structure of the multimedia document retrieval method includes at least: terminal device 10, gateway 20, network attached
Memory devices 30, local area network 40 and acquisition equipment 50, wherein it should be noted that connection relationship shown in figure can be
Wired connection can also be wireless connection, for example, the connection between the gateway 20 and the Network Attached Storage equipment 30
It can be attached, can also be attached by wireless WiFi module, wireless blue tooth module etc. by the communications cable.
The terminal device 10 is the operable equipment for having certain processing and display function of user, in the present embodiment
In, the terminal device 10 includes portable equipment 101 or computer 102, wherein the portable equipment 101 includes notebook electricity
Brain, tablet computer, smart phone etc., the computer 102 include desktop computer, intelligence control system, intelligent refrigerator, intelligence
Washing machine etc..
The portable equipment 101 or the computer 102 are attached from the network by the certification and addressing of the gateway 20
Belong to memory devices 30 and obtains destination multimedia file.The portable equipment 101 and the computer 102 include user interface
11, the user interface 11 is the window of human-computer interaction, and the user interface 11 can be LED or LCD or CRT display screen.?
In some embodiments, the portable equipment 101 or the computer 102 include hardware input equipment, the portable equipment 101 or
The computer 102 receives the instruction of the hardware input equipment, is shown in the user interface 11, and portable is set by described
It is executed for 101 or the computer 102.
Specifically, in the present embodiment, the portable equipment 101 or the computer 102 include voice capture device 21,
The voice capture device 21 is mounted on the inside or surface of the portable equipment 101 or the computer 102, has been mainly used for
At the conditioning of signal and the acquisition function of signal, original voice signal is converted to talk spurt sequence, generally, the voice
Acquiring equipment 21 includes the signal processings such as sound/electrotransformation, signal condition and signal sampling.For example, when entering the voice
When the user interface 11 of information input, voice messaging is issued according to prompt information, the voice capture device 21 acquires institute's predicate
Message breath acquires user speech search instruction alternatively, the voice capture device 21 is opened in triggering.
In the present embodiment, as shown in Fig. 2, the portable equipment 101 or the computer 102 further include prime processing mould
Block 22, voice training module 23, speech recognition module 24 and voice cue module 25.The pre-processing module 22 and institute's predicate
Sound acquires equipment 21 and connects, and the pre-processing module 22 will be extracted for filtering interference signals, extraction speech characteristic vector
Speech characteristic vector be quantized into received pronunciation characteristic vector.When the voice training module 23 and the pre-processing module 22
When connection, the voice training module 23 is used to the phonetic feature normal vector of multi collect, extraction carrying out probability statistics, mentions
The best voice characteristic standard vector for taking speaker prevents from causing to extract characteristic parameter not because of factors such as speaker's mood, environment
Speech recognition effect accurately is influenced, therefore the module mainly includes the treatment processes such as probability statistics, parameter evaluation, with hidden Ma Er
It can husband's model (HMM model) realization.When the speech recognition module 24 is connect with the pre-processing module 22, the voice
Identification module 24 is sentenced for the received pronunciation characteristic vector resurveyed to be compared with the speech model in voice template library
Disconnected current speech command functions, therefore the module mainly includes that vector compares and two processes of parameter evaluation.The voice prompting
Module 25 is connect with the speech recognition module 24, and the voice cue module 25 is used to prompt to use according to the result of speech recognition
The function that family carries out relevant operation or explanation is currently completed, therefore the module mainly includes calling suggestion voice resource file, D/A
The speech processes processes such as conversion, signal amplification.
In some embodiments, the user interface 11 include search condition frame, ACK button, return upper level button and
Into next stage button etc..Specifically, for example, working as the image that retrieve the C time point of the fork in the road B of A road, in search condition
Frame input the keyword of voice messaging conversion, the portable equipment 101 or the computer 102 to the search condition into
Row preliminary analysis and judgement, and sent via the gateway 20 from the corresponding Network Attached Storage equipment 30 and obtain A
The request of the image at the C time point of the fork in the road B of road, the Network Attached Storage equipment 30 return to acquisition request response,
The request-reply carries the image at the C time point of the fork in the road B of A road.
In some embodiments, when the local area network 40 is Zigbee ad hoc network, the local area network 40 by serial ports or
Single-chip microcontroller is connect with the processor of the Network Attached Storage equipment 30, to realize the data between Zigbee net and Ethernet
Transmitting.
Also referring to Fig. 3, the Network Attached Storage equipment 30 is divided into multiple storage regions 31, in the present embodiment
In, the storage region 31 includes character image information bank, animal painting information bank, equipment Image Database, personage's sound letter
Cease library and animal sounds information bank, wherein each described storage region 31 includes key frame 311, file identification 312 and time
313 are stabbed, includes image classification information or audio classification information in the key frame 311, the file identification 312 is actually
Filename is generally made of prefix name and suffix name, and the timestamp 313 is a character string, uniquely identifies certain a moment
Time.The image classification that image can be determined by the key frame 311 can determine described deposit according to described image classification
Storage area domain 31 can further be determined by the file identification 312 and timestamp 313 and be stored in different storage zone 31
Multimedia file.
In the present embodiment, the acquisition equipment 50 includes that picture pick-up device 501 and/or sound pick-up outfit 502 and/or sensing are set
Standby 503 etc., it will be understood that the picture pick-up device 501 can be video camera, an at least video camera according to certain rules or
It is laid in realistic space according to practical situation, the video camera can maximumlly can increase in conjunction with multidimensional motor to be taken the photograph
The acquisition range of camera.In some embodiments, it can choose integrated video camera substitution multidimensional motor in conjunction with video camera
Mode, for example, hemispherical all-in-one machine, quick ball-type all-in-one machine, the integral machine in conjunction with holder or camera lens are built in holder
All-in-one machine etc., above-mentioned all-in-one machine may be implemented to focus automatically.Preferably, selection has water-proof function, small volume, resolution ratio
High, high life and the video camera with universal communication interface etc..
It is appreciated that the acquisition equipment 50 can be including picture pick-up device 501 and/or sound pick-up outfit 502 and/or sensing
The electronic equipment of equipment 503, it may for example comprise intelligent KTV, intelligent access control system, the smart phone of built-in camera and recorder
Deng alternatively, the acquisition equipment 50 can be independent picture pick-up device 501 or sound pick-up outfit 502, for example, the security protection in cell
Monitoring.In some embodiments, independent picture pick-up device 501 or sound pick-up outfit 502 are used for root also with the sensing equipment 503
According to the physical signal that the sensing equipment 503 acquires, controls the picture pick-up device 501 or sound pick-up outfit 502 enters preset work
Operation mode (for example, open and close of the picture pick-up device 501 or sound pick-up outfit 502).
Fig. 4 is referred to, Fig. 4 is a kind of flow diagram of multimedia document retrieval method provided in an embodiment of the present invention.
As shown in figure 4, the multimedia document retrieval method is applied to Network Attached Storage equipment, the Network Attached Storage is set
It is ready for use on storing multimedia, which comprises
S11: speech retrieval instruction is received.
As shown in figure 5, in the present embodiment, the step S11 is specifically included:
S111: voice messaging acquisition is carried out by voice capture device.
S112: identify whether the voice messaging is default language.
S113: if so, converting text information for the voice messaging is sent to the Network Attached Storage equipment.
S114: it if it is not, then converting default language for the voice messaging, and is converted into text information and is sent to the net
Network annex storage equipment.
It is appreciated that terminal device receives speech retrieval instruction (i.e. speech ciphering equipment acquisition voice messaging), the voice inspection
Suo Zhiling can be handled for terminal recognition, and the speech retrieval instruction includes filename keyword, place keyword, time-critical
Word, personage's keyword, animal keyword or equipment keyword etc., for example, when speech retrieval instruction is " the retrieval campus XX XX
Auditorium XX period principal XX ", wherein " campus the XX auditorium XX " is place keyword, and " XX period " is time-critical word,
" principal XX " is personage's keyword.
The speech retrieval instruction can be the voice messaging that user inputs immediately, is also possible to terminal device and records in advance
Voice messaging.Since voice capture device is when acquiring voice messaging, inevitably collect noise, for reduce interference and
The workload for reducing processing, can be filtered by pre-processing module.
S12: it is instructed according to the speech retrieval, determines reference picture.
As shown in fig. 6, in the present embodiment, step S12 is specifically included:
S121: parsing the speech retrieval instruction, determines the keyword of the speech retrieval instruction.
Referring to Figure 7 together, in the present embodiment, step S121 is specifically included:
S1211: text information is converted by the voice messaging, is classified to the text information.
It in the present embodiment, can be by the text information according to the time of data, place, data duration, data
Size etc. is classified, in the processing pressure for being sent to the Network Attached Storage equipment after classification.
In some embodiments, the voice messaging is first transformed into language and characters information in terminal device, it then, will
The language and characters information is sent to the Network Attached Storage equipment, will be described in the Network Attached Storage equipment
Language and characters information is converted into the text information, greatly reduces the processing pressure of the Network Attached Storage equipment.
S1212: by the text information after classification processing it is for statistical analysis after, determine the key of speech retrieval instruction
Word.
The vocabulary in text information is counted, the text information that will be provided with identical, close or associated vocabulary is classified as one kind,
The text information that will be provided with same attribute vocabulary is classified as one kind, is analyzed convenient for subsequent, and keyword is extracted.
S122: according to the keyword, associated images are obtained from internet or local data base;
For example, obtaining all images of the principal from internet or local data base when retrieving the image of principal XX
Information, the associated images can be multiple images of the same person or the same things under different times different background,
It can be same type of different character images or different things image, for example, " Donald duck " animated image of different-style.
S123: from the associated images, reference picture is determined.
As shown in figure 8, in the present embodiment, step S123 is specifically included:
S1231: user operation instruction is received.
S1232: according to the operational order, the reference picture is determined.
The above are the modes of reference picture described in user's manual confirmation, and in the present embodiment, multiple associated images are in
In the user interface of present terminal device, according to the touch operation of user, at least one reference picture is determined.
Alternatively, as shown in figure 9, in the present embodiment, step S123 is specifically included:
S1233: the associated images are subjected to priority ranking by reference frequency, image definition or renewal time.
S1234: according to the priority, reference picture is determined.
The above are the modes that system automatically confirms that the reference picture, when according to reference frequency, image definition or updating
Between be ranked up and push, be more bonded the use habit of most of user, promote recall precision.
S13: according to the reference picture, the target of the speech retrieval instruction is determined for compliance with from the multimedia file
Image.
As shown in Figure 10, in the present embodiment, step S13 is specifically included:
S131: the reference picture characteristic point of the reference picture is identified.
For example, the identification for the cargo that disappears fastly (product that disappear of also expressing one's gratification), will not only recognize a bottle packing, also to recognize is
One bottle of Yoghourt or beer will not only recognize Yoghourt, also to recognize be which brand Yoghourt, even which taste and rule
Lattice.The reference picture characteristic point includes figurative mark, font trade mark, keyword, shape of product, packaging color, packaging pattern
With bar code etc., can preset the image characteristic point for needing to extract and compare be which, can also be by the ginseng
It examines image characteristic point to be compared specific to the smallest elementary area, reduces the identification work of repeatability, raising efficiency.
In some embodiments, it can be remembered by depth network learning model and identify same type article as far as possible
More image datas simulates a variety of different scenes and shoots to establish huge tranining database to 360 ° of cargo progress, with
This obtains the most abundant training data, and machine or the network equipment are learnt according to training data, establish identification model.
S132: the multimedia file is split as picture frame.
S133: judge whether the reference picture characteristic point matches with the image characteristic point of each described image frame.
It should be noted that the image characteristic point of the reference picture characteristic point and each described image frame is an a pair
The relationship answered, the three-dimensional coordinate that can use under the same coordinate system carry out fixed point comparison.For example, by the figure of the reference picture
The figurative mark of trade mark and each described image frame is compared, so that the comparison of characteristic point has realistic meaning.
S134: according to the judging result, the image of the reference picture characteristic point and each described image frame is counted
The number of matches of characteristic point.
Also referring to Figure 11, in the present embodiment, step S134 is specifically included:
S1341: if the reference picture characteristic point does not match with the image characteristic point of each described image frame, continue to sentence
Whether next reference picture characteristic point of breaking matches with the image characteristic point of each described image frame.
S1342: if the Image Feature Point Matching of the reference picture characteristic point and each described image frame, described in statistics
The number of matches of the image characteristic point of reference picture characteristic point and each described image frame.
In a fairly large number of situation for the characteristic point for needing to compare, the reference picture characteristic point and each are being judged
During whether the image characteristic point of described image frame is matched, if the reference picture characteristic point and each described image frame
Image characteristic point do not match, the figure for judging next the reference picture characteristic point and each described image frame should be continued
It as whether characteristic point matches, rather than terminates deterministic process or re-starts judgement, further improve treatment effeciency, also fill
Divide the influence for considering environmental factor and other factors, such as, in fact it could happen that there is Individual features point that can not identify or match not
Successful situation.
S135: according to the number of matches, the confidence level of image is determined.
If only one or a few features point are matched, it is understood that there may be accidentally, so that there are errors for judging result.
S136: according to the confidence level, it is determined for compliance with the target image of the speech retrieval instruction.
Also referring to Figure 12, in the present embodiment, step S136 is specifically included:
S1361: judge whether the confidence level is higher than default confidence threshold.
S1362: if so, determining that the corresponding image of described image frame is the target figure for meeting the speech retrieval instruction
Picture.
S14: the video clip of preset time period of the interception including the target image is stored in destination folder
In, wherein when a video clip includes two or more target images, the two neighboring target image
Interval time be less than preset time threshold.
Figure 13 is please referred to, Figure 13 is a kind of application signal of multimedia document retrieval method provided in an embodiment of the present invention
Figure.As shown in figure 13, it may include the target image in the video clip, also may include multiple target figures
Picture.
The preset time period can be equal, i.e., the equal length of each video clip, for example, in " file 2 "
The length of each video clip is t1, at this point, there may be the two neighboring video clips there is the case where overlapping.It is described pre-
If the period is also possible to unequal, i.e., the length of each video clip is unequal, for example, the video clip 1 in " file 3 "
Length is t2, and 2 length of video clip is t3, wherein t2 is greater than t3, can be well by the multimedia file in the way of this
In all target images all choose and intercept out, when a video clip includes two or more targets
When image, such as length is the video clip 1 of t2, and the interval time ti of the two neighboring target image is less than preset time threshold
Value, it is in other words, if the interval time ti of the two neighboring target image is less than preset time threshold, i.e., described two described
Target image belongs to same video clip.
It should be noted that the preset time threshold can be by user's manual setting, it can also be according to different more matchmakers
Body file dynamic adjusts.
Multimedia document retrieval method provided in an embodiment of the present invention is by receiving speech retrieval instruction, according to the voice
Search instruction determines reference picture, and according to the reference picture, the speech retrieval is determined for compliance with from the multimedia file
The target image of instruction intercepts the video clip of the preset time period including the target image, is stored in file destination
In folder, wherein when a video clip includes two or more target images, the two neighboring target figure
The interval time of picture is less than preset time threshold, to realize automatic, efficient and high accuracy rate multimedia document retrieval.
As shown in figure 14, the embodiment of the invention also provides another multimedia document retrieval method, the method is also wrapped
It includes:
S15: range of search instruction is received.
S16: it is instructed according to the range of search, determines the multimedia file to be retrieved.
It is appreciated that due to being stored with a large amount of multimedia file in the Network Attached Storage equipment, if each time
Retrieval be intended to access all data, it will generate a large amount of nonsensical work, be raising efficiency, reduce processor
Processing pressure, therefore, it is necessary to introduce range of search.
Referring to Figure 13, it is assumed that include " file 1 ", " file 2 " and " text in the Network Attached Storage equipment
Part 3 " still, alternatively, not meeting speech retrieval instruction, can pass through retrieval required for user when " file 1 " is clearly not
Extent directive increases restrictive condition, and " file 1 " is sent outside, suitable range of search is screened and (determines described to be retrieved
Multimedia file), the retrieval of multimedia file is carried out within this range.
As shown in figure 15, the embodiment of the invention also provides another multimedia document retrieval method, the method is also wrapped
It includes:
S17: shearing or merging treatment are carried out to the video clip.
S18: by treated, the video clip generates corresponding video link.
To allow users to carry out to share and in view of the terminal device memory capacity of user is limited, in some embodiments
In, by treated, the video clip is sent in Network Attached Storage equipment and stores.Meanwhile for convenience of other users
It watches the video clip and does not occupy the flow of other users excessively, control Network Attached Storage equipment in the piece of video
Corresponding video link is generated after section, so that other users is fetched by lattice chain and obtains the video clip.
Further, the content to enable other users to understand the video clip preferably to decide whether to watch
The video clip, the method also includes control Network Attached Storage equipment generate video clip preview, and with it is described
Video link binding, can thus understand the content of the video clip by preview.
Referring to Fig. 9, Fig. 9 is a kind of structural schematic diagram of multimedia document retrieval device provided in an embodiment of the present invention.
As shown in figure 9, the multimedia document retrieval device 400 is applied to Network Attached Storage equipment, the network attached storage
Device equipment is used for storing multimedia, and described device 400 includes:
First receiving unit 401, for receiving speech retrieval instruction.First receiving unit 401 is specifically used for: passing through
Voice capture device carries out voice messaging acquisition;Identify whether the voice messaging is default language;If so, by the voice
Information is converted into text information and is sent to the Network Attached Storage equipment;If it is not, then converting the voice messaging to silent
Recognize language, and is converted into text information and is sent to the Network Attached Storage equipment.
First determination unit 402 instructs according to the speech retrieval for determining, determines reference picture.Described first really
Order member 402 is specifically used for: parsing the speech retrieval instruction, determines the keyword of the speech retrieval instruction;According to described
Keyword obtains associated images from internet or local data base;From the associated images, reference picture is determined.
Second determination unit 403, for being determined for compliance with institute's predicate from the multimedia file according to the reference picture
The target image of sound search instruction.Second determination unit 403 is specifically used for: identifying that the reference picture of the reference picture is special
Sign point;The multimedia file is split as picture frame;Judge the reference picture characteristic point and each described image frame
Whether image characteristic point matches;According to the judging result, the reference picture characteristic point and each described image frame are counted
Image characteristic point number of matches;According to the number of matches, the confidence level of image is determined;According to the confidence level, determine
Meet the target image of the speech retrieval instruction.
Interception unit 404 is stored in for intercepting the video clip of the preset time period including the target image
In destination folder, wherein when a video clip includes two or more target images, two neighboring institute
The interval time for stating target image is less than preset time threshold.
In some embodiments, as shown in figure 17, described device 400 further include:
Second receiving unit 405, for receiving range of search instruction.
Third determination unit 406 instructs according to the range of search, determines the multimedia file to be retrieved.
In some embodiments, as shown in figure 18, described device 400 further include:
Processing unit 407, for carrying out shearing or merging treatment to the video clip;
Generation unit 408, for the video clip to generate corresponding video link by treated.
Since Installation practice and above-mentioned each embodiment are based on same design, in the not mutual conflicting premise of content
Under, the content of Installation practice can quote the content of above-mentioned each embodiment, and this will not be repeated here.
Figure 19 is a kind of structural schematic diagram of Network Attached Storage equipment provided in an embodiment of the present invention, this is network attached
Memory devices 500 include:
One or more processors 510 and memory 520, in Figure 19 by taking a processor 510 as an example.
Processor 510 can be connected with memory 520 by bus or other modes, to be connected by bus in Figure 19
For.
Memory 520 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey
Sequence, non-volatile computer executable program and module, as the multimedia document retrieval method in the embodiment of the present invention is corresponding
Program instruction/module.Processor 510 by operation be stored in memory 520 non-volatile software program, instruction and
Module realizes more matchmakers of above method embodiment thereby executing the various function application and data processing of the user terminal
Body document retrieval method.
Memory 520 may include storing program area and storage data area, wherein storing program area can store operation system
Application program required for system, at least one function;Storage data area can store the use according to Network Attached Storage equipment
The data etc. created.In addition, memory 520 may include high-speed random access memory, it can also include non-volatile deposit
Reservoir, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some implementations
In example, optional memory 520 includes the memory remotely located relative to processor 510, these remote memories can pass through
It is connected to the network to Network Attached Storage equipment.The example of above-mentioned network includes but is not limited to internet, intranet, local
Net, mobile radio communication and combinations thereof.
One or more of modules are stored in the memory 520, when by one or more of processors
When 510 execution, the multimedia document retrieval method in above-mentioned any means embodiment is executed, for example, executing Fig. 4 described above
In method and step S11 to step S14, realize Figure 16 in unit 401-404 function.
Method provided by the embodiment of the present invention can be performed in the said goods, has the corresponding functional module of execution method and has
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present invention.
The embodiment of the invention also provides a kind of non-volatile computer readable storage medium storing program for executing, the computer-readable storage
Media storage has computer executable instructions, which is executed by one or more processors, such as Figure 19
In a processor 510, may make said one or multiple processors that more matchmakers in above-mentioned any means embodiment can be performed
Body document retrieval method, for example, executing above description executes the method and step S11 in Figure 15 described above to step S18, it is real
The function of unit 401-408 in existing Figure 18.
Method provided by the embodiment of the present invention can be performed in the said goods, has the corresponding functional module of execution method and has
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present invention.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.
Through the above description of the embodiments, those of ordinary skill in the art can be understood that each embodiment
The mode of general hardware platform can be added to realize by software, naturally it is also possible to pass through hardware.Those of ordinary skill in the art can
With understand all or part of the process realized in above-described embodiment method be can be instructed by computer program it is relevant hard
Part is completed, and the program can be stored in a computer-readable storage medium, the program is when being executed, it may include as above
State the process of the embodiment of each method.Wherein, the storage medium can be magnetic disk, CD, read-only memory (Read-
Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;At this
It under the thinking of invention, can also be combined between the technical characteristic in above embodiments or different embodiment, step can be with
It is realized with random order, and there are many other variations of different aspect present invention as described above, for simplicity, they do not have
Have and is provided in details;Although the present invention is described in detail referring to the foregoing embodiments, the ordinary skill people of this field
Member is it is understood that it is still possible to modify the technical solutions described in the foregoing embodiments, or to part of skill
Art feature is equivalently replaced;And these are modified or replaceed, each reality of the present invention that it does not separate the essence of the corresponding technical solution
Apply the range of a technical solution.
Claims (10)
1. a kind of multimedia document retrieval method is applied to Network Attached Storage equipment, the Network Attached Storage equipment
For storing multimedia, which is characterized in that the described method includes:
Receive speech retrieval instruction;
It is instructed according to the speech retrieval, determines reference picture;
According to the reference picture, the target image of the speech retrieval instruction is determined for compliance with from the multimedia file;
The video clip for intercepting the preset time period including the target image, is stored in destination folder, wherein when
When one video clip includes two or more target images, the interval time of the two neighboring target image
Less than preset time threshold.
2. the method according to claim 1, wherein the method also includes:
Receive range of search instruction;
It is instructed according to the range of search, determines the multimedia file to be retrieved.
3. the method according to claim 1, wherein reception speech retrieval instruction includes:
Voice messaging acquisition is carried out by voice capture device;
Identify whether the voice messaging is default language;
If so, converting text information for the voice messaging is sent to the Network Attached Storage equipment;
If it is not, then convert default language for the voice messaging, and it is converted into text information and is sent to described network attached deposit
Storage device.
4. determining the method according to claim 1, wherein described instruct according to the speech retrieval with reference to figure
Picture, comprising:
The speech retrieval instruction is parsed, determines the keyword of the speech retrieval instruction;
According to the keyword, associated images are obtained from internet or local data base;
From the associated images, reference picture is determined.
5. according to the method described in claim 4, it is characterized in that, the parsing speech retrieval instruction, determines institute's predicate
The keyword of sound search instruction, comprising:
Text information is converted by the voice messaging, is classified to the text information;
By the text information after classification processing it is for statistical analysis after, determine the keyword of speech retrieval instruction.
6. being wrapped according to the method described in claim 4, determining reference picture it is characterized in that, described from the associated images
It includes:
Receive user operation instruction;
According to the operational order, the reference picture is determined;
Alternatively,
The associated images are subjected to priority ranking by reference frequency, image definition or renewal time;
According to the priority, reference picture is determined.
7. the method according to claim 1, wherein described according to the reference picture, from the multimedia text
The target image of the speech retrieval instruction is determined for compliance in part, comprising:
Identify the reference picture characteristic point of the reference picture;
The multimedia file is split as picture frame;
Judge whether the reference picture characteristic point matches with the image characteristic point of each described image frame;
According to the judging result, of the reference picture characteristic point and the image characteristic point of each described image frame is counted
With quantity;
According to the number of matches, the confidence level of image is determined;
According to the confidence level, it is determined for compliance with the target image of the speech retrieval instruction.
8. statistics is described with reference to figure the method according to the description of claim 7 is characterized in that described according to the judging result
As the number of matches of characteristic point and the image characteristic point of each described image frame, comprising:
If the reference picture characteristic point does not match with the image characteristic point of each described image frame, continue to judge next institute
State whether reference picture characteristic point matches with the image characteristic point of each described image frame;
If the Image Feature Point Matching of the reference picture characteristic point and each described image frame, it is special to count the reference picture
Number of matches of the sign point with the image characteristic point of each described image frame.
9. being determined for compliance with the voice the method according to the description of claim 7 is characterized in that described according to the confidence level
The target image of search instruction, comprising:
Judge whether the confidence level is higher than default confidence threshold;
If so, determining that the corresponding image of described image frame is the target image for meeting the speech retrieval instruction.
10. the method according to claim 1, wherein the method also includes:
Shearing or merging treatment are carried out to the video clip;
By treated, the video clip generates corresponding video link.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811117840.0A CN109271533A (en) | 2018-09-21 | 2018-09-21 | A kind of multimedia document retrieval method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811117840.0A CN109271533A (en) | 2018-09-21 | 2018-09-21 | A kind of multimedia document retrieval method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109271533A true CN109271533A (en) | 2019-01-25 |
Family
ID=65198823
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811117840.0A Pending CN109271533A (en) | 2018-09-21 | 2018-09-21 | A kind of multimedia document retrieval method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109271533A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110275988A (en) * | 2019-06-14 | 2019-09-24 | 秒针信息技术有限公司 | Obtain the method and device of picture |
CN110737840A (en) * | 2019-10-22 | 2020-01-31 | 青岛海信电器股份有限公司 | Voice control method and display device |
CN111556336A (en) * | 2020-05-12 | 2020-08-18 | 腾讯科技(深圳)有限公司 | Multimedia file processing method, device, terminal equipment and medium |
CN111738042A (en) * | 2019-10-25 | 2020-10-02 | 北京沃东天骏信息技术有限公司 | Identification method, device and storage medium |
CN111739358A (en) * | 2020-06-19 | 2020-10-02 | 联想(北京)有限公司 | Teaching file output method and device and electronic equipment |
CN112019789A (en) * | 2019-05-31 | 2020-12-01 | 杭州海康威视数字技术股份有限公司 | Video playback method and device |
CN112380922A (en) * | 2020-10-23 | 2021-02-19 | 岭东核电有限公司 | Method and device for determining compound video frame, computer equipment and storage medium |
CN113139094A (en) * | 2021-05-06 | 2021-07-20 | 北京百度网讯科技有限公司 | Video searching method and device, electronic equipment and medium |
CN114356852A (en) * | 2022-03-21 | 2022-04-15 | 展讯通信(天津)有限公司 | File retrieval method, electronic equipment and storage medium |
CN116828099A (en) * | 2023-08-29 | 2023-09-29 | 荣耀终端有限公司 | Shooting method, medium and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102402621A (en) * | 2011-12-27 | 2012-04-04 | 浙江大学 | Image retrieval method based on image classification |
CN105005630A (en) * | 2015-08-18 | 2015-10-28 | 瑞达昇科技(大连)有限公司 | Method for multi-dimensional detection of specific targets from omnimedia |
CN106096577A (en) * | 2016-06-24 | 2016-11-09 | 安徽工业大学 | Target tracking system in a kind of photographic head distribution map and method for tracing |
CN107025275A (en) * | 2017-03-21 | 2017-08-08 | 腾讯科技(深圳)有限公司 | Video searching method and device |
CN107480236A (en) * | 2017-08-08 | 2017-12-15 | 深圳创维数字技术有限公司 | A kind of information query method, device, equipment and medium |
CN107729573A (en) * | 2017-11-24 | 2018-02-23 | 百度在线网络技术(北京)有限公司 | Information-pushing method and device |
CN108259974A (en) * | 2018-03-07 | 2018-07-06 | 优酷网络技术(北京)有限公司 | Video matching method and device |
-
2018
- 2018-09-21 CN CN201811117840.0A patent/CN109271533A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102402621A (en) * | 2011-12-27 | 2012-04-04 | 浙江大学 | Image retrieval method based on image classification |
CN105005630A (en) * | 2015-08-18 | 2015-10-28 | 瑞达昇科技(大连)有限公司 | Method for multi-dimensional detection of specific targets from omnimedia |
CN106096577A (en) * | 2016-06-24 | 2016-11-09 | 安徽工业大学 | Target tracking system in a kind of photographic head distribution map and method for tracing |
CN107025275A (en) * | 2017-03-21 | 2017-08-08 | 腾讯科技(深圳)有限公司 | Video searching method and device |
CN107480236A (en) * | 2017-08-08 | 2017-12-15 | 深圳创维数字技术有限公司 | A kind of information query method, device, equipment and medium |
CN107729573A (en) * | 2017-11-24 | 2018-02-23 | 百度在线网络技术(北京)有限公司 | Information-pushing method and device |
CN108259974A (en) * | 2018-03-07 | 2018-07-06 | 优酷网络技术(北京)有限公司 | Video matching method and device |
Non-Patent Citations (2)
Title |
---|
李涛 等: "《数据挖掘的应用与实践-大数据时代的案例分析》", 31 October 2013, 厦门大学出版社 * |
黄祚继 等: "《近景摄影测量影像匹配方法研究与应用》", 30 September 2017, 河海大学出版社 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112019789A (en) * | 2019-05-31 | 2020-12-01 | 杭州海康威视数字技术股份有限公司 | Video playback method and device |
CN110275988A (en) * | 2019-06-14 | 2019-09-24 | 秒针信息技术有限公司 | Obtain the method and device of picture |
CN110737840A (en) * | 2019-10-22 | 2020-01-31 | 青岛海信电器股份有限公司 | Voice control method and display device |
CN110737840B (en) * | 2019-10-22 | 2023-07-28 | 海信视像科技股份有限公司 | Voice control method and display device |
CN111738042A (en) * | 2019-10-25 | 2020-10-02 | 北京沃东天骏信息技术有限公司 | Identification method, device and storage medium |
CN111556336B (en) * | 2020-05-12 | 2023-07-14 | 腾讯科技(深圳)有限公司 | Multimedia file processing method, device, terminal equipment and medium |
CN111556336A (en) * | 2020-05-12 | 2020-08-18 | 腾讯科技(深圳)有限公司 | Multimedia file processing method, device, terminal equipment and medium |
CN111739358A (en) * | 2020-06-19 | 2020-10-02 | 联想(北京)有限公司 | Teaching file output method and device and electronic equipment |
CN112380922A (en) * | 2020-10-23 | 2021-02-19 | 岭东核电有限公司 | Method and device for determining compound video frame, computer equipment and storage medium |
CN112380922B (en) * | 2020-10-23 | 2024-03-22 | 岭东核电有限公司 | Method, device, computer equipment and storage medium for determining multiple video frames |
CN113139094A (en) * | 2021-05-06 | 2021-07-20 | 北京百度网讯科技有限公司 | Video searching method and device, electronic equipment and medium |
CN113139094B (en) * | 2021-05-06 | 2023-11-07 | 北京百度网讯科技有限公司 | Video searching method and device, electronic equipment and medium |
CN114356852A (en) * | 2022-03-21 | 2022-04-15 | 展讯通信(天津)有限公司 | File retrieval method, electronic equipment and storage medium |
CN116828099A (en) * | 2023-08-29 | 2023-09-29 | 荣耀终端有限公司 | Shooting method, medium and electronic equipment |
CN116828099B (en) * | 2023-08-29 | 2023-12-19 | 荣耀终端有限公司 | Shooting method, medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109271533A (en) | A kind of multimedia document retrieval method | |
KR102571011B1 (en) | Responding to Remote Media Classification Queries Using Classifier Models and Context Parameters | |
JP7242520B2 (en) | visually aided speech processing | |
CN110147726A (en) | Business quality detecting method and device, storage medium and electronic device | |
CN102779509B (en) | Voice processing equipment and voice processing method | |
US9454958B2 (en) | Exploiting heterogeneous data in deep neural network-based speech recognition systems | |
US11556302B2 (en) | Electronic apparatus, document displaying method thereof and non-transitory computer readable recording medium | |
CN106406806A (en) | A control method and device for intelligent apparatuses | |
CN106294774A (en) | User individual data processing method based on dialogue service and device | |
CN110225387A (en) | A kind of information search method, device and electronic equipment | |
WO2017206661A1 (en) | Voice recognition method and system | |
CN108701453A (en) | Modularization deep learning model | |
WO2013054839A1 (en) | Knowledge information processing server system provided with image recognition system | |
CN105979376A (en) | Recommendation method and device | |
US10204292B2 (en) | User terminal device and method of recognizing object thereof | |
US11127399B2 (en) | Method and apparatus for pushing information | |
CN112182229A (en) | Text classification model construction method, text classification method and device | |
CN111462741B (en) | Voice data processing method, device and storage medium | |
CN109660865A (en) | Make method and device, medium and the electronic equipment of video tab automatically for video | |
CN107104994A (en) | Audio recognition method, electronic installation and speech recognition system | |
CN108345612A (en) | A kind of question processing method and device, a kind of device for issue handling | |
CN113033245A (en) | Function adjusting method and device, storage medium and electronic equipment | |
CN110992937A (en) | Language offline recognition method, terminal and readable storage medium | |
CN107910006A (en) | Audio recognition method, device and multiple source speech differentiation identifying system | |
CN116665083A (en) | Video classification method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190125 |