CN109271533A

CN109271533A - A kind of multimedia document retrieval method

Info

Publication number: CN109271533A
Application number: CN201811117840.0A
Authority: CN
Inventors: 杨富东
Original assignee: Shenzhen Jiuzhou Electric Appliance Co Ltd
Current assignee: Shenzhen Jiuzhou Electric Appliance Co Ltd
Priority date: 2018-09-21
Filing date: 2018-09-21
Publication date: 2019-01-25

Abstract

The present embodiments relate to intelligent security guard technical fields, disclose a kind of multimedia document retrieval method.The multimedia document retrieval method is applied to Network Attached Storage equipment, the Network Attached Storage equipment is used for storing multimedia, the described method includes: by receiving speech retrieval instruction, it is instructed according to the speech retrieval, determine reference picture, according to the reference picture, the target image of the speech retrieval instruction is determined for compliance with from the multimedia file, intercept the video clip of the preset time period including the target image, it is stored in destination folder, wherein, when a video clip includes two or more target images, the interval time of the two neighboring target image is less than preset time threshold.The present invention realizes automatic, efficient and high accuracy rate multimedia document retrieval.

Description

A kind of multimedia document retrieval method

Technical field

The present embodiments relate to intelligent security guard technical field more particularly to a kind of multimedia document retrieval methods.

Background technique

Intelligent security guard technology refers to the transimission and storage technology of the informationization of service, image, with technology of Internet of things Popularization and application, so that the security protection in city is developed from past simple security protection system to city integratedization system.

Multimedia file is the important information source of intelligent security guard, the intelligence in the region or a system regions to be realized Energy security protection, for example, to analyze the vehicle flow information of certain a road section, vehicle position information, communal facility security information, meteorological letter Breath etc., it is necessary to acquire or store in real time a large amount of multimedia document informations.It is huge due to calculating data, and have to data higher Security requirement, usually a large amount of multimedia document information is stored in Network Attached Storage equipment.

It realizes in process of the present invention, at least there are the following problems in the related technology for inventor's discovery: currently, when needing from net It is searched in network annex storage equipment and meets all video clips of a certain condition, for example, it is desired in the more of awards ceremony When searching all video clips including a certain star in media video file, need by playing the multimedia video from the beginning Frequency file needs the Manual interception video clip, takes time and effort, thereby increases and it is possible to deposit when observing the video frame including a certain star The case where omitting.

Summary of the invention

The embodiment of the present invention provides a kind of automatic, efficient and high accuracy rate multimedia document retrieval method.

In order to solve the above-mentioned technical problem, the embodiment of the invention discloses following technical solutions:

In a first aspect, being applied to network attached storage the embodiment of the invention provides a kind of multimedia document retrieval method Device equipment, the Network Attached Storage equipment are used for storing multimedia, which comprises

Receive speech retrieval instruction；

It is instructed according to the speech retrieval, determines reference picture；

According to the reference picture, the target figure of the speech retrieval instruction is determined for compliance with from the multimedia file Picture；

The video clip for intercepting the preset time period including the target image, is stored in destination folder, In, when a video clip includes two or more target images, between the two neighboring target image It is less than preset time threshold every the time.

Optionally, the method also includes:

Receive range of search instruction；

It is instructed according to the range of search, determines the multimedia file to be retrieved.

Optionally, the reception speech retrieval, which instructs, includes:

Voice messaging acquisition is carried out by voice capture device；

Identify whether the voice messaging is default language；

If so, converting text information for the voice messaging is sent to the Network Attached Storage equipment；

If it is not, then convert default language for the voice messaging, and it is converted into text information to be sent to the network attached Belong to memory devices.

Optionally, described to be instructed according to the speech retrieval, determine reference picture, comprising:

The speech retrieval instruction is parsed, determines the keyword of the speech retrieval instruction；

According to the keyword, associated images are obtained from internet or local data base；

From the associated images, reference picture is determined.

Optionally, the parsing speech retrieval instruction determines the keyword of the speech retrieval instruction, comprising:

Text information is converted by the voice messaging, is classified to the text information；

By the text information after classification processing it is for statistical analysis after, determine the keyword of speech retrieval instruction.

Optionally, described from the associated images, determine reference picture, comprising:

Receive user operation instruction；

According to the operational order, the reference picture is determined；

Alternatively,

The associated images are subjected to priority ranking by reference frequency, image definition or renewal time；

According to the priority, reference picture is determined.

Optionally, described according to the reference picture, it is determined for compliance with the speech retrieval from the multimedia file and refers to The target image of order, comprising:

Identify the reference picture characteristic point of the reference picture；

The multimedia file is split as picture frame；

Judge whether the reference picture characteristic point matches with the image characteristic point of each described image frame；

According to the judging result, the image characteristic point of the reference picture characteristic point and each described image frame is counted Number of matches；

According to the number of matches, the confidence level of image is determined；

According to the confidence level, it is determined for compliance with the target image of the speech retrieval instruction.

Optionally, described according to the judging result, count the reference picture characteristic point and each described image frame Image characteristic point number of matches, comprising:

If the reference picture characteristic point does not match with the image characteristic point of each described image frame, continue to judge next Whether a reference picture characteristic point matches with the image characteristic point of each described image frame；

If the Image Feature Point Matching of the reference picture characteristic point and each described image frame, count described with reference to figure As the number of matches of characteristic point and the image characteristic point of each described image frame.

Optionally, described according to the confidence level, it is determined for compliance with the target image of the speech retrieval instruction, comprising:

Judge whether the confidence level is higher than default confidence threshold；

If so, determining that the corresponding image of described image frame is the target image for meeting the speech retrieval instruction.

Optionally, the method also includes:

Shearing or merging treatment are carried out to the video clip；

By treated, the video clip generates corresponding video link.

Second aspect, the embodiment of the invention provides a kind of multimedia document retrieval devices, are applied to network attached storage Device equipment, the Network Attached Storage equipment are used for storing multimedia, and described device includes:

First receiving unit, for receiving speech retrieval instruction；

First determination unit instructs according to the speech retrieval for determining, determines reference picture；

Second determination unit, for being determined for compliance with the voice from the multimedia file according to the reference picture The target image of search instruction；

Interception unit is stored in mesh for intercepting the video clip of the preset time period including the target image It marks in file, wherein two neighboring described when a video clip includes two or more target images The interval time of target image is less than preset time threshold.

Optionally, described device further include:

Second receiving unit, for receiving range of search instruction；

Third determination unit instructs according to the range of search, determines the multimedia file to be retrieved.

Optionally, first receiving unit is specifically used for:

Voice messaging acquisition is carried out by voice capture device；

Identify whether the voice messaging is default language；

Optionally, first determination unit is specifically used for:

From the associated images, reference picture is determined.

Optionally, second determination unit is specifically used for:

Identify the reference picture characteristic point of the reference picture；

The multimedia file is split as picture frame；

Optionally, described device further include:

Processing unit, for carrying out shearing or merging treatment to the video clip；

Generation unit, for the video clip to generate corresponding video link by treated.

The third aspect, the embodiment of the invention provides a kind of Network Attached Storage equipment, comprising:

At least one processor；And

The memory being connect at least one described processor communication；Wherein,

The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes, so that at least one described processor is able to carry out multimedia document retrieval method as described above.

Fourth aspect is described non-easy the embodiment of the invention also provides a kind of non-volatile computer readable storage medium storing program for executing The property lost computer-readable recording medium storage has computer executable instructions, and the computer executable instructions are for making computer Execute multimedia document retrieval method as described above.

The beneficial effect of the embodiment of the present invention is: being in contrast to the prior art, the embodiment of the invention provides one kind Multimedia document retrieval method.By receiving speech retrieval instruction, is instructed according to the speech retrieval, determine reference picture, root According to the reference picture, the target image of the speech retrieval instruction is determined for compliance with from the multimedia file, interception includes The video clip of preset time period including the target image, is stored in destination folder, wherein when a video When segment includes two or more target images, the interval time of the two neighboring target image is less than preset time Threshold value, to realize automatic, efficient and high accuracy rate multimedia document retrieval.

Detailed description of the invention

One or more embodiments are illustrated by the image in corresponding attached drawing, these exemplary theorys The bright restriction not constituted to embodiment, the element in attached drawing with same reference numbers label are expressed as similar element, remove Non- to have special statement, composition does not limit the figure in attached drawing.

Fig. 1 is the schematic network structure of multimedia document retrieval method provided in an embodiment of the present invention；

Fig. 2 is the structural schematic diagram of terminal device 10 in Fig. 1；

Fig. 3 is a kind of storage region schematic diagram of Network Attached Storage equipment provided in an embodiment of the present invention；

Fig. 4 is a kind of flow diagram of multimedia document retrieval method provided in an embodiment of the present invention；

Fig. 5 is the flow diagram of step S11 in Fig. 4；

Fig. 6 is the flow diagram of step S12 in Fig. 4；

Fig. 7 is the flow diagram of step S121 in Fig. 6；

Fig. 8 is the flow diagram of step S123 in Fig. 6；

Fig. 9 is another flow diagram of step S123 in Fig. 6；

Figure 10 is another flow diagram of step S13 in Fig. 4；

Figure 11 is another flow diagram of step S134 in Figure 10；

Figure 12 is another flow diagram of step S136 in Figure 10；

Figure 13 is a kind of application schematic diagram of multimedia document retrieval method provided in an embodiment of the present invention；

Figure 14 be another embodiment of the present invention provides a kind of multimedia document retrieval method flow diagram；

Figure 15 is a kind of flow diagram for multimedia document retrieval method that further embodiment of this invention provides；

Figure 16 is a kind of structural schematic diagram of multimedia document retrieval device provided in an embodiment of the present invention；

Figure 17 be another embodiment of the present invention provides a kind of multimedia document retrieval device structural schematic diagram；

Figure 18 is a kind of structural schematic diagram for multimedia document retrieval device that further embodiment of this invention provides；

Figure 19 is a kind of structural schematic diagram of Network Attached Storage equipment provided in an embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

In addition, as long as technical characteristic involved in the various embodiments of the present invention described below is each other not Constituting conflict can be combined with each other.

Referring to Fig. 1, Fig. 1 is the schematic network structure of multimedia document retrieval method provided in an embodiment of the present invention. As shown in Figure 1, the network structure of the multimedia document retrieval method includes at least: terminal device 10, gateway 20, network attached Memory devices 30, local area network 40 and acquisition equipment 50, wherein it should be noted that connection relationship shown in figure can be Wired connection can also be wireless connection, for example, the connection between the gateway 20 and the Network Attached Storage equipment 30 It can be attached, can also be attached by wireless WiFi module, wireless blue tooth module etc. by the communications cable.

The terminal device 10 is the operable equipment for having certain processing and display function of user, in the present embodiment In, the terminal device 10 includes portable equipment 101 or computer 102, wherein the portable equipment 101 includes notebook electricity Brain, tablet computer, smart phone etc., the computer 102 include desktop computer, intelligence control system, intelligent refrigerator, intelligence Washing machine etc..

The portable equipment 101 or the computer 102 are attached from the network by the certification and addressing of the gateway 20 Belong to memory devices 30 and obtains destination multimedia file.The portable equipment 101 and the computer 102 include user interface 11, the user interface 11 is the window of human-computer interaction, and the user interface 11 can be LED or LCD or CRT display screen.? In some embodiments, the portable equipment 101 or the computer 102 include hardware input equipment, the portable equipment 101 or The computer 102 receives the instruction of the hardware input equipment, is shown in the user interface 11, and portable is set by described It is executed for 101 or the computer 102.

Specifically, in the present embodiment, the portable equipment 101 or the computer 102 include voice capture device 21, The voice capture device 21 is mounted on the inside or surface of the portable equipment 101 or the computer 102, has been mainly used for At the conditioning of signal and the acquisition function of signal, original voice signal is converted to talk spurt sequence, generally, the voice Acquiring equipment 21 includes the signal processings such as sound/electrotransformation, signal condition and signal sampling.For example, when entering the voice When the user interface 11 of information input, voice messaging is issued according to prompt information, the voice capture device 21 acquires institute's predicate Message breath acquires user speech search instruction alternatively, the voice capture device 21 is opened in triggering.

In the present embodiment, as shown in Fig. 2, the portable equipment 101 or the computer 102 further include prime processing mould Block 22, voice training module 23, speech recognition module 24 and voice cue module 25.The pre-processing module 22 and institute's predicate Sound acquires equipment 21 and connects, and the pre-processing module 22 will be extracted for filtering interference signals, extraction speech characteristic vector Speech characteristic vector be quantized into received pronunciation characteristic vector.When the voice training module 23 and the pre-processing module 22 When connection, the voice training module 23 is used to the phonetic feature normal vector of multi collect, extraction carrying out probability statistics, mentions The best voice characteristic standard vector for taking speaker prevents from causing to extract characteristic parameter not because of factors such as speaker's mood, environment Speech recognition effect accurately is influenced, therefore the module mainly includes the treatment processes such as probability statistics, parameter evaluation, with hidden Ma Er It can husband's model (HMM model) realization.When the speech recognition module 24 is connect with the pre-processing module 22, the voice Identification module 24 is sentenced for the received pronunciation characteristic vector resurveyed to be compared with the speech model in voice template library Disconnected current speech command functions, therefore the module mainly includes that vector compares and two processes of parameter evaluation.The voice prompting Module 25 is connect with the speech recognition module 24, and the voice cue module 25 is used to prompt to use according to the result of speech recognition The function that family carries out relevant operation or explanation is currently completed, therefore the module mainly includes calling suggestion voice resource file, D/A The speech processes processes such as conversion, signal amplification.

In some embodiments, the user interface 11 include search condition frame, ACK button, return upper level button and Into next stage button etc..Specifically, for example, working as the image that retrieve the C time point of the fork in the road B of A road, in search condition Frame input the keyword of voice messaging conversion, the portable equipment 101 or the computer 102 to the search condition into Row preliminary analysis and judgement, and sent via the gateway 20 from the corresponding Network Attached Storage equipment 30 and obtain A The request of the image at the C time point of the fork in the road B of road, the Network Attached Storage equipment 30 return to acquisition request response, The request-reply carries the image at the C time point of the fork in the road B of A road.

In some embodiments, when the local area network 40 is Zigbee ad hoc network, the local area network 40 by serial ports or Single-chip microcontroller is connect with the processor of the Network Attached Storage equipment 30, to realize the data between Zigbee net and Ethernet Transmitting.

Also referring to Fig. 3, the Network Attached Storage equipment 30 is divided into multiple storage regions 31, in the present embodiment In, the storage region 31 includes character image information bank, animal painting information bank, equipment Image Database, personage's sound letter Cease library and animal sounds information bank, wherein each described storage region 31 includes key frame 311, file identification 312 and time 313 are stabbed, includes image classification information or audio classification information in the key frame 311, the file identification 312 is actually Filename is generally made of prefix name and suffix name, and the timestamp 313 is a character string, uniquely identifies certain a moment Time.The image classification that image can be determined by the key frame 311 can determine described deposit according to described image classification Storage area domain 31 can further be determined by the file identification 312 and timestamp 313 and be stored in different storage zone 31 Multimedia file.

In the present embodiment, the acquisition equipment 50 includes that picture pick-up device 501 and/or sound pick-up outfit 502 and/or sensing are set Standby 503 etc., it will be understood that the picture pick-up device 501 can be video camera, an at least video camera according to certain rules or It is laid in realistic space according to practical situation, the video camera can maximumlly can increase in conjunction with multidimensional motor to be taken the photograph The acquisition range of camera.In some embodiments, it can choose integrated video camera substitution multidimensional motor in conjunction with video camera Mode, for example, hemispherical all-in-one machine, quick ball-type all-in-one machine, the integral machine in conjunction with holder or camera lens are built in holder All-in-one machine etc., above-mentioned all-in-one machine may be implemented to focus automatically.Preferably, selection has water-proof function, small volume, resolution ratio High, high life and the video camera with universal communication interface etc..

It is appreciated that the acquisition equipment 50 can be including picture pick-up device 501 and/or sound pick-up outfit 502 and/or sensing The electronic equipment of equipment 503, it may for example comprise intelligent KTV, intelligent access control system, the smart phone of built-in camera and recorder Deng alternatively, the acquisition equipment 50 can be independent picture pick-up device 501 or sound pick-up outfit 502, for example, the security protection in cell Monitoring.In some embodiments, independent picture pick-up device 501 or sound pick-up outfit 502 are used for root also with the sensing equipment 503 According to the physical signal that the sensing equipment 503 acquires, controls the picture pick-up device 501 or sound pick-up outfit 502 enters preset work Operation mode (for example, open and close of the picture pick-up device 501 or sound pick-up outfit 502).

Fig. 4 is referred to, Fig. 4 is a kind of flow diagram of multimedia document retrieval method provided in an embodiment of the present invention. As shown in figure 4, the multimedia document retrieval method is applied to Network Attached Storage equipment, the Network Attached Storage is set It is ready for use on storing multimedia, which comprises

S11: speech retrieval instruction is received.

As shown in figure 5, in the present embodiment, the step S11 is specifically included:

S111: voice messaging acquisition is carried out by voice capture device.

S112: identify whether the voice messaging is default language.

S113: if so, converting text information for the voice messaging is sent to the Network Attached Storage equipment.

S114: it if it is not, then converting default language for the voice messaging, and is converted into text information and is sent to the net Network annex storage equipment.

It is appreciated that terminal device receives speech retrieval instruction (i.e. speech ciphering equipment acquisition voice messaging), the voice inspection Suo Zhiling can be handled for terminal recognition, and the speech retrieval instruction includes filename keyword, place keyword, time-critical Word, personage's keyword, animal keyword or equipment keyword etc., for example, when speech retrieval instruction is " the retrieval campus XX XX Auditorium XX period principal XX ", wherein " campus the XX auditorium XX " is place keyword, and " XX period " is time-critical word, " principal XX " is personage's keyword.

The speech retrieval instruction can be the voice messaging that user inputs immediately, is also possible to terminal device and records in advance Voice messaging.Since voice capture device is when acquiring voice messaging, inevitably collect noise, for reduce interference and The workload for reducing processing, can be filtered by pre-processing module.

S12: it is instructed according to the speech retrieval, determines reference picture.

As shown in fig. 6, in the present embodiment, step S12 is specifically included:

S121: parsing the speech retrieval instruction, determines the keyword of the speech retrieval instruction.

Referring to Figure 7 together, in the present embodiment, step S121 is specifically included:

S1211: text information is converted by the voice messaging, is classified to the text information.

It in the present embodiment, can be by the text information according to the time of data, place, data duration, data Size etc. is classified, in the processing pressure for being sent to the Network Attached Storage equipment after classification.

In some embodiments, the voice messaging is first transformed into language and characters information in terminal device, it then, will The language and characters information is sent to the Network Attached Storage equipment, will be described in the Network Attached Storage equipment Language and characters information is converted into the text information, greatly reduces the processing pressure of the Network Attached Storage equipment.

S1212: by the text information after classification processing it is for statistical analysis after, determine the key of speech retrieval instruction Word.

The vocabulary in text information is counted, the text information that will be provided with identical, close or associated vocabulary is classified as one kind, The text information that will be provided with same attribute vocabulary is classified as one kind, is analyzed convenient for subsequent, and keyword is extracted.

S122: according to the keyword, associated images are obtained from internet or local data base；

For example, obtaining all images of the principal from internet or local data base when retrieving the image of principal XX Information, the associated images can be multiple images of the same person or the same things under different times different background, It can be same type of different character images or different things image, for example, " Donald duck " animated image of different-style.

S123: from the associated images, reference picture is determined.

As shown in figure 8, in the present embodiment, step S123 is specifically included:

S1231: user operation instruction is received.

S1232: according to the operational order, the reference picture is determined.

The above are the modes of reference picture described in user's manual confirmation, and in the present embodiment, multiple associated images are in In the user interface of present terminal device, according to the touch operation of user, at least one reference picture is determined.

Alternatively, as shown in figure 9, in the present embodiment, step S123 is specifically included:

S1233: the associated images are subjected to priority ranking by reference frequency, image definition or renewal time.

S1234: according to the priority, reference picture is determined.

The above are the modes that system automatically confirms that the reference picture, when according to reference frequency, image definition or updating Between be ranked up and push, be more bonded the use habit of most of user, promote recall precision.

S13: according to the reference picture, the target of the speech retrieval instruction is determined for compliance with from the multimedia file Image.

As shown in Figure 10, in the present embodiment, step S13 is specifically included:

S131: the reference picture characteristic point of the reference picture is identified.

For example, the identification for the cargo that disappears fastly (product that disappear of also expressing one's gratification), will not only recognize a bottle packing, also to recognize is One bottle of Yoghourt or beer will not only recognize Yoghourt, also to recognize be which brand Yoghourt, even which taste and rule Lattice.The reference picture characteristic point includes figurative mark, font trade mark, keyword, shape of product, packaging color, packaging pattern With bar code etc., can preset the image characteristic point for needing to extract and compare be which, can also be by the ginseng It examines image characteristic point to be compared specific to the smallest elementary area, reduces the identification work of repeatability, raising efficiency.

In some embodiments, it can be remembered by depth network learning model and identify same type article as far as possible More image datas simulates a variety of different scenes and shoots to establish huge tranining database to 360 ° of cargo progress, with This obtains the most abundant training data, and machine or the network equipment are learnt according to training data, establish identification model.

S132: the multimedia file is split as picture frame.

S133: judge whether the reference picture characteristic point matches with the image characteristic point of each described image frame.

It should be noted that the image characteristic point of the reference picture characteristic point and each described image frame is an a pair The relationship answered, the three-dimensional coordinate that can use under the same coordinate system carry out fixed point comparison.For example, by the figure of the reference picture The figurative mark of trade mark and each described image frame is compared, so that the comparison of characteristic point has realistic meaning.

S134: according to the judging result, the image of the reference picture characteristic point and each described image frame is counted The number of matches of characteristic point.

Also referring to Figure 11, in the present embodiment, step S134 is specifically included:

S1341: if the reference picture characteristic point does not match with the image characteristic point of each described image frame, continue to sentence Whether next reference picture characteristic point of breaking matches with the image characteristic point of each described image frame.

S1342: if the Image Feature Point Matching of the reference picture characteristic point and each described image frame, described in statistics The number of matches of the image characteristic point of reference picture characteristic point and each described image frame.

In a fairly large number of situation for the characteristic point for needing to compare, the reference picture characteristic point and each are being judged During whether the image characteristic point of described image frame is matched, if the reference picture characteristic point and each described image frame Image characteristic point do not match, the figure for judging next the reference picture characteristic point and each described image frame should be continued It as whether characteristic point matches, rather than terminates deterministic process or re-starts judgement, further improve treatment effeciency, also fill Divide the influence for considering environmental factor and other factors, such as, in fact it could happen that there is Individual features point that can not identify or match not Successful situation.

S135: according to the number of matches, the confidence level of image is determined.

If only one or a few features point are matched, it is understood that there may be accidentally, so that there are errors for judging result.

S136: according to the confidence level, it is determined for compliance with the target image of the speech retrieval instruction.

Also referring to Figure 12, in the present embodiment, step S136 is specifically included:

S1361: judge whether the confidence level is higher than default confidence threshold.

S1362: if so, determining that the corresponding image of described image frame is the target figure for meeting the speech retrieval instruction Picture.

S14: the video clip of preset time period of the interception including the target image is stored in destination folder In, wherein when a video clip includes two or more target images, the two neighboring target image Interval time be less than preset time threshold.

Figure 13 is please referred to, Figure 13 is a kind of application signal of multimedia document retrieval method provided in an embodiment of the present invention Figure.As shown in figure 13, it may include the target image in the video clip, also may include multiple target figures Picture.

The preset time period can be equal, i.e., the equal length of each video clip, for example, in " file 2 " The length of each video clip is t1, at this point, there may be the two neighboring video clips there is the case where overlapping.It is described pre- If the period is also possible to unequal, i.e., the length of each video clip is unequal, for example, the video clip 1 in " file 3 " Length is t2, and 2 length of video clip is t3, wherein t2 is greater than t3, can be well by the multimedia file in the way of this In all target images all choose and intercept out, when a video clip includes two or more targets When image, such as length is the video clip 1 of t2, and the interval time ti of the two neighboring target image is less than preset time threshold Value, it is in other words, if the interval time ti of the two neighboring target image is less than preset time threshold, i.e., described two described Target image belongs to same video clip.

It should be noted that the preset time threshold can be by user's manual setting, it can also be according to different more matchmakers Body file dynamic adjusts.

Multimedia document retrieval method provided in an embodiment of the present invention is by receiving speech retrieval instruction, according to the voice Search instruction determines reference picture, and according to the reference picture, the speech retrieval is determined for compliance with from the multimedia file The target image of instruction intercepts the video clip of the preset time period including the target image, is stored in file destination In folder, wherein when a video clip includes two or more target images, the two neighboring target figure The interval time of picture is less than preset time threshold, to realize automatic, efficient and high accuracy rate multimedia document retrieval.

As shown in figure 14, the embodiment of the invention also provides another multimedia document retrieval method, the method is also wrapped It includes:

S15: range of search instruction is received.

S16: it is instructed according to the range of search, determines the multimedia file to be retrieved.

It is appreciated that due to being stored with a large amount of multimedia file in the Network Attached Storage equipment, if each time Retrieval be intended to access all data, it will generate a large amount of nonsensical work, be raising efficiency, reduce processor Processing pressure, therefore, it is necessary to introduce range of search.

Referring to Figure 13, it is assumed that include " file 1 ", " file 2 " and " text in the Network Attached Storage equipment Part 3 " still, alternatively, not meeting speech retrieval instruction, can pass through retrieval required for user when " file 1 " is clearly not Extent directive increases restrictive condition, and " file 1 " is sent outside, suitable range of search is screened and (determines described to be retrieved Multimedia file), the retrieval of multimedia file is carried out within this range.

As shown in figure 15, the embodiment of the invention also provides another multimedia document retrieval method, the method is also wrapped It includes:

S17: shearing or merging treatment are carried out to the video clip.

S18: by treated, the video clip generates corresponding video link.

To allow users to carry out to share and in view of the terminal device memory capacity of user is limited, in some embodiments In, by treated, the video clip is sent in Network Attached Storage equipment and stores.Meanwhile for convenience of other users It watches the video clip and does not occupy the flow of other users excessively, control Network Attached Storage equipment in the piece of video Corresponding video link is generated after section, so that other users is fetched by lattice chain and obtains the video clip.

Further, the content to enable other users to understand the video clip preferably to decide whether to watch The video clip, the method also includes control Network Attached Storage equipment generate video clip preview, and with it is described Video link binding, can thus understand the content of the video clip by preview.

Referring to Fig. 9, Fig. 9 is a kind of structural schematic diagram of multimedia document retrieval device provided in an embodiment of the present invention. As shown in figure 9, the multimedia document retrieval device 400 is applied to Network Attached Storage equipment, the network attached storage Device equipment is used for storing multimedia, and described device 400 includes:

First receiving unit 401, for receiving speech retrieval instruction.First receiving unit 401 is specifically used for: passing through Voice capture device carries out voice messaging acquisition；Identify whether the voice messaging is default language；If so, by the voice Information is converted into text information and is sent to the Network Attached Storage equipment；If it is not, then converting the voice messaging to silent Recognize language, and is converted into text information and is sent to the Network Attached Storage equipment.

First determination unit 402 instructs according to the speech retrieval for determining, determines reference picture.Described first really Order member 402 is specifically used for: parsing the speech retrieval instruction, determines the keyword of the speech retrieval instruction；According to described Keyword obtains associated images from internet or local data base；From the associated images, reference picture is determined.

Second determination unit 403, for being determined for compliance with institute's predicate from the multimedia file according to the reference picture The target image of sound search instruction.Second determination unit 403 is specifically used for: identifying that the reference picture of the reference picture is special Sign point；The multimedia file is split as picture frame；Judge the reference picture characteristic point and each described image frame Whether image characteristic point matches；According to the judging result, the reference picture characteristic point and each described image frame are counted Image characteristic point number of matches；According to the number of matches, the confidence level of image is determined；According to the confidence level, determine Meet the target image of the speech retrieval instruction.

Interception unit 404 is stored in for intercepting the video clip of the preset time period including the target image In destination folder, wherein when a video clip includes two or more target images, two neighboring institute The interval time for stating target image is less than preset time threshold.

In some embodiments, as shown in figure 17, described device 400 further include:

Second receiving unit 405, for receiving range of search instruction.

Third determination unit 406 instructs according to the range of search, determines the multimedia file to be retrieved.

In some embodiments, as shown in figure 18, described device 400 further include:

Processing unit 407, for carrying out shearing or merging treatment to the video clip；

Generation unit 408, for the video clip to generate corresponding video link by treated.

Since Installation practice and above-mentioned each embodiment are based on same design, in the not mutual conflicting premise of content Under, the content of Installation practice can quote the content of above-mentioned each embodiment, and this will not be repeated here.

Figure 19 is a kind of structural schematic diagram of Network Attached Storage equipment provided in an embodiment of the present invention, this is network attached Memory devices 500 include:

One or more processors 510 and memory 520, in Figure 19 by taking a processor 510 as an example.

Processor 510 can be connected with memory 520 by bus or other modes, to be connected by bus in Figure 19 For.

Memory 520 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey Sequence, non-volatile computer executable program and module, as the multimedia document retrieval method in the embodiment of the present invention is corresponding Program instruction/module.Processor 510 by operation be stored in memory 520 non-volatile software program, instruction and Module realizes more matchmakers of above method embodiment thereby executing the various function application and data processing of the user terminal Body document retrieval method.

Memory 520 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function；Storage data area can store the use according to Network Attached Storage equipment The data etc. created.In addition, memory 520 may include high-speed random access memory, it can also include non-volatile deposit Reservoir, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some implementations In example, optional memory 520 includes the memory remotely located relative to processor 510, these remote memories can pass through It is connected to the network to Network Attached Storage equipment.The example of above-mentioned network includes but is not limited to internet, intranet, local Net, mobile radio communication and combinations thereof.

One or more of modules are stored in the memory 520, when by one or more of processors When 510 execution, the multimedia document retrieval method in above-mentioned any means embodiment is executed, for example, executing Fig. 4 described above In method and step S11 to step S14, realize Figure 16 in unit 401-404 function.

Method provided by the embodiment of the present invention can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present invention.

The embodiment of the invention also provides a kind of non-volatile computer readable storage medium storing program for executing, the computer-readable storage Media storage has computer executable instructions, which is executed by one or more processors, such as Figure 19 In a processor 510, may make said one or multiple processors that more matchmakers in above-mentioned any means embodiment can be performed Body document retrieval method, for example, executing above description executes the method and step S11 in Figure 15 described above to step S18, it is real The function of unit 401-408 in existing Figure 18.

The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.

Through the above description of the embodiments, those of ordinary skill in the art can be understood that each embodiment The mode of general hardware platform can be added to realize by software, naturally it is also possible to pass through hardware.Those of ordinary skill in the art can With understand all or part of the process realized in above-described embodiment method be can be instructed by computer program it is relevant hard Part is completed, and the program can be stored in a computer-readable storage medium, the program is when being executed, it may include as above State the process of the embodiment of each method.Wherein, the storage medium can be magnetic disk, CD, read-only memory (Read- Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；At this It under the thinking of invention, can also be combined between the technical characteristic in above embodiments or different embodiment, step can be with It is realized with random order, and there are many other variations of different aspect present invention as described above, for simplicity, they do not have Have and is provided in details；Although the present invention is described in detail referring to the foregoing embodiments, the ordinary skill people of this field Member is it is understood that it is still possible to modify the technical solutions described in the foregoing embodiments, or to part of skill Art feature is equivalently replaced；And these are modified or replaceed, each reality of the present invention that it does not separate the essence of the corresponding technical solution Apply the range of a technical solution.

Claims

1. a kind of multimedia document retrieval method is applied to Network Attached Storage equipment, the Network Attached Storage equipment For storing multimedia, which is characterized in that the described method includes:

Receive speech retrieval instruction；

According to the reference picture, the target image of the speech retrieval instruction is determined for compliance with from the multimedia file；

The video clip for intercepting the preset time period including the target image, is stored in destination folder, wherein when When one video clip includes two or more target images, the interval time of the two neighboring target image Less than preset time threshold.

2. the method according to claim 1, wherein the method also includes:

Receive range of search instruction；

3. the method according to claim 1, wherein reception speech retrieval instruction includes:

Voice messaging acquisition is carried out by voice capture device；

Identify whether the voice messaging is default language；

If it is not, then convert default language for the voice messaging, and it is converted into text information and is sent to described network attached deposit Storage device.

4. determining the method according to claim 1, wherein described instruct according to the speech retrieval with reference to figure Picture, comprising:

From the associated images, reference picture is determined.

5. according to the method described in claim 4, it is characterized in that, the parsing speech retrieval instruction, determines institute's predicate The keyword of sound search instruction, comprising:

6. being wrapped according to the method described in claim 4, determining reference picture it is characterized in that, described from the associated images It includes:

Receive user operation instruction；

According to the operational order, the reference picture is determined；

Alternatively,

According to the priority, reference picture is determined.

7. the method according to claim 1, wherein described according to the reference picture, from the multimedia text The target image of the speech retrieval instruction is determined for compliance in part, comprising:

Identify the reference picture characteristic point of the reference picture；

The multimedia file is split as picture frame；

According to the judging result, of the reference picture characteristic point and the image characteristic point of each described image frame is counted With quantity；

8. statistics is described with reference to figure the method according to the description of claim 7 is characterized in that described according to the judging result As the number of matches of characteristic point and the image characteristic point of each described image frame, comprising:

If the reference picture characteristic point does not match with the image characteristic point of each described image frame, continue to judge next institute State whether reference picture characteristic point matches with the image characteristic point of each described image frame；

If the Image Feature Point Matching of the reference picture characteristic point and each described image frame, it is special to count the reference picture Number of matches of the sign point with the image characteristic point of each described image frame.

9. being determined for compliance with the voice the method according to the description of claim 7 is characterized in that described according to the confidence level The target image of search instruction, comprising:

10. the method according to claim 1, wherein the method also includes:

Shearing or merging treatment are carried out to the video clip；

By treated, the video clip generates corresponding video link.