CN103294746A

CN103294746A - Method and device of de-identification in visual media data

Info

Publication number: CN103294746A
Application number: CN2013100163589A
Authority: CN
Inventors: T·F·希达-马穆德; D·J·贝莫尔; O·U·F·肖克; D·B·庞塞里昂; 时代
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2012-01-16
Filing date: 2013-01-16
Publication date: 2013-09-11
Anticipated expiration: 2033-01-16
Also published as: US20130182007A1; CN103294746B; US9147179B2; US20130182006A1; US9147178B2

Abstract

A method of de-identification in visual media data is described. The method includes merging a sequence of images from a set of visual media data into an averaged image; bounding portions of the averaged image that are determined to be relatively fixed, wherein each bounded portion is identified by a corresponding position in the averaged image; generating a template comprising the bounded portions and the corresponding position for each bounded portion in the averaged image; and de-identifying the sequence of images by obfuscating content in the bounded portions.

Description

The method and system identified of going that is used for the visual media data

Technical field

Control demand and professional consideration usually require the mode swap data with safety, particularly in health care." health insurance portability and accountability act " in 1996 (HIPAA) stipulated to require safely swap data and non-disclosure patient information specific.Therefore, the data of some type must be modified with obfuscation (obfuscate) sensitivity or security information before by exchange.

Solution goes the method for identification (de-identification) to address the problem usually: what (1) should go to identify, (2) how much should go identification, should and how (3) go identification? the several different methods of going to identify document and metadata comprises that removal is marked as the built-in code of the part of identifying, and perhaps writes the method based on template from the information of the document in the document classification.Go recognition methods be used to text document, such as the structural metadata field in digital imaging and communications in medicine (DICOM) metadata, but the visual media data when sign embeds in the content to go to identify be difficult with consuming time.

Summary of the invention

The embodiment of system has been described.In one embodiment, this system is that visual media goes recognition system.This system comprises: the image combiner, and it is configured to and will merges into the average image from the image sequence of visual media data acquisition; And removing to identify engine, it is configured to: define the relatively-stationary part that is confirmed as of (bound) the average image, wherein each is defined part and is identified by the relevant position in the average image; Generation comprises that each that be defined part and be used for the average image is defined the template of the relevant position of part; And by obfuscation be defined the part content make a return journey identification described image sequence.Other embodiment of this system have also been described.

The embodiment of computer program has also been described.In one embodiment, this computer program comprises the computer-readable recording medium of storing computer-readable program, and wherein this computer-readable program makes this computing machine carry out the operation that is used for going to identify the visual media data when being carried out by the processor in the computing machine.Described operation comprises: will merge into the average image from the image sequence of visual media data acquisition; Define the relatively-stationary part that is confirmed as of the average image, wherein each is defined part and is identified by the relevant position in the average image; Generation comprises that each that be defined part and be used for the average image is defined the template of the relevant position of part; And by obfuscation be defined the part content make a return journey identification described image sequence.Other embodiment of this device have also been described.

The embodiment of method has also been described.In one embodiment, this method is be used to the method for going to identify the visual media data.This method comprises: will merge into the average image from the image sequence of visual media data acquisition; Define the relatively-stationary part that is confirmed as of the average image, wherein each is defined part and is identified by the relevant position in the average image; Generation comprises that each that be defined part and be used for the average image is defined the template of the relevant position of part; And by obfuscation be defined the part content make a return journey identification described image sequence.Other embodiment of this method have also been described.

Description of drawings

Fig. 1 shows the synoptic diagram that visual media removes an embodiment of recognition system;

The visual media that Fig. 2 shows Fig. 1 removes the synoptic diagram of an embodiment of recognition system;

Fig. 3 shows for generating the process flow diagram of an embodiment that visual media goes the method for recognition template;

Fig. 4 shows the synoptic diagram of an embodiment of the average image of Fig. 1;

Fig. 5 shows the synoptic diagram of an embodiment of the average image of Fig. 1;

The visual media that Fig. 6 shows Fig. 1 removes the synoptic diagram of an embodiment of recognition template;

Fig. 7 shows the synoptic diagram be used to an embodiment of the method for going to identify the visual media data;

In whole instructions, similar label can be used for identifying similar parts.

Embodiment

Be readily appreciated that, here general description and in the accompanying drawings the parts of illustrated each embodiment can diversified different configuration be arranged and design.Therefore, be not to be intended to limit the scope of the present disclosure to the following detailed description as each embodiment of representing in the accompanying drawing, but only be used for each embodiment of explanation.Although the many aspects of embodiment present in the accompanying drawings, accompanying drawing is not necessarily drawn in proportion, unless specialize.

The present invention may be embodied in and not break away from other spirit or essential characteristic in other particular forms.It is illustrative that described embodiment will be understood that in all respects, and nonrestrictive.Therefore, scope of the present invention is indicated by appended claims, and can't help the indication of this detailed description.The implication and all changes within the scope that are in the equivalent of claim will be included within the scope of claim.

The feature of mentioning in the whole instructions, advantage or similar language do not mean that available all these feature and advantage of the invention process should or determine in single embodiment of the present invention.On the contrary, the language of mentioning feature and advantage is understood that to mean that being relevant to special characteristic, advantage or the characteristics that embodiment describes is included at least one embodiment of the present invention.Therefore, in the whole instructions to the discussion of characteristics and advantage and similar language throughout can but the identical embodiment of nonessential finger.

In addition, described feature of the present invention, advantage and characteristics can be combined by any way in one or more embodiments.Those skilled in the relevant art will understand, and according to the description here, the present invention can be implemented under the situation of the one or more special characteristics that do not have specific embodiment or advantage.In other cases, can identify in certain embodiments is not the accompanying drawing feature and advantage that appear among all embodiment of the present invention.

" embodiment ", " embodiment " or the similar language throughout of mentioning in the whole instructions means that the special characteristic, structure or the characteristics that are relevant to indicated embodiment description are included at least one embodiment of the present invention.Therefore, phrase " in one embodiment ", " in an embodiment " or the similar language throughout in the whole instructions can but nonessentially all refer to identical embodiment.

Although described a lot of embodiment here, at least some described embodiment have provided for the system and method that goes to identify from the secret or sensitive data of visual media data.More specifically, this system merges the image sequence from the visual media data acquisition, and automatically identifies the position that comprises text in the average image probably.The user can manually improve the selected part of the average image, to determine which partly comprises sensitivity or security information, and system generates template from the average image and selected part, this template can be applied to the image sequence in the described media data set, so that the security information in each image sequence of obfuscation.

Be used for determining going identification and so on some conventional methods to comprise at text data: (i) use edit tool hand labeled sensitizing range, and the zone in the solution documentation by Manual Logos so that deletion; (ii) go identification automatically, wherein use the text mining method to find the responsive fragment of text, such as title, date and address, no matter it is in structured data fields or in free text (freetext); And the (iii) fixedly known field in the structural data, for example, title, date and address can be removed by the code at this field structureization especially.Conventional method also manually marked region in case determine identification what.Be used for how going the conventional method identified to comprise removing the built-in code that is identified part or use method based on template.In the method based on template of routine, this template is to come manual creation by highlight the zone that (highlight) will write in the sample files of given classification.When the form (form) of limited quantity but the time spent, this method can be useful.

In image and video, sensitive information can embed in the various layouts, thereby creates a lot of Format Types.For DICOM image and video, the specific data of the patient who sees in image will change along with pattern (modality) type (echo or angiogram), manufacturer's (different manufacturers may show different information at its screen) and inspection itself (it further depends on corresponding disease and check result thereof) especially.The combination of these factors causes a large amount of commendation types that will see in the visual media data, become very loaded down with trivial details and the cost poor efficiency thereby manually learn template.For example, in typical echo record, nearly 50 different Format Types can occur, it is corresponding to about 146 detections that surpass of depending on disease by ultrasonic inspection person's record.Therefore, can at least semi-automatically generate system be used to the template of going to identify the visual media data can provide to various types of images and video fast and go identification efficiently.In certain embodiments, can be from the sample set study template of visual media data source.In addition, can realize correction or the modification of template by semi-automatic process described herein.In addition, the training stage forming template, template (or elementary version of template) can be applied to the candidate region in the image, so that the some or all of of identifying of picture/text content that define in the frame of recognition template may take place.

Fig. 1 shows the synoptic diagram that visual media removes an embodiment of recognition system 100.Shown visual media goes recognition system 100 to comprise the multiple assembly of following detailed description, and it can carry out function described herein and operation.In one embodiment, go at least some assemblies in the recognition system 100 in computer system, to realize.For example, go the function of one or more assemblies of recognition system 100 to realize by being stored in the computer memory device 100 and by the computer program instructions of carrying out such as the treatment facility 104 of CPU.Go recognition system 100 can comprise other assemblies, such as input-output apparatus 106, disk storage driver 108, image combiner 110, remove to identify engine 112 and template generator 114.Visual media goes some or all combinations of recognition system 100 can be stored in the single computing equipment, or is stored in the network of the computing equipment that comprises cordless communication network.Go recognition system 100 can comprise than more or less assembly described herein or subsystem.In certain embodiments, go recognition system 100 to can be used for implementing method as shown in Figure 7 described herein.

In one embodiment, image combiner 110 receives image sequence 116 from visual media data acquisition 118.In certain embodiments, described visual media data acquisition 118 can comprise image or the video that uses Medical Devices to catch, and comprises ultrasonoscopy, echocardiogram image, angiographic image or any other visual media.In other embodiments, described visual media data acquisition 118 can comprise image or the video that use is caught or generated for the equipment of other application.Each image sequence 116 can comprise a lot of images.For example, a video can comprise a lot of single image frames that per second is caught.

In certain embodiments, image combiner 110 is configured to receive image sequence 116 from a plurality of visual media data acquisitions 118.Each visual media data acquisition 118 can be corresponding to different Machine Types.Can a plurality of visual media data acquisitions be divided into groups according to predetermined classification, the categorizing system of described classification such as Machine Type, position, hospital, department or any other type, for described categorizing system, image sequence 116 in each visual media data acquisition 118 have with corresponding visual media data acquisition 118 in common some visual interest of other image sequences 116, such as layout, text similarity or other characteristics.

In one embodiment, image combiner 110 merges image in the image sequence 116 by come average pixel value 120 at each image of striding entire image sequence 116.Obtain the average image 122 like this, its pixel value 120 is that all images of striding in the image sequence 116 is average.In one embodiment, because the average image 122 comprises average pixel value 120, the average image 122 can only show all or the constant or fixing pixel of most of image of striding in the image sequence 116.The algorithm that is used for average pixel value 120 can be based on other characteristics of brightness, color value, saturation degree and/or pixel.In certain embodiments, the image of striding image sequence 116 pixel that do not satisfy preset frequency threshold value 124 is filtered out from the average image 122.This frequency threshold 124 can be any value of the text field in permission system 100 enough recognition image sequences 116.

In one embodiment, the pixel value 120 that produces in the average image 122 shows with the mean flow rate at the pixel value 120 of each location of pixels.Therefore, depend on the color mode of image, the pixel that does not have constant active value can be shown as black picture element in the average image 122.For example, comprise the black pixel value 120 of striding the constant pixel of all or most of image for not being from the average image 122 that is ultrasonic image sequence 116, have white or bright pixel value 120 and stride the pixel that all or most of image have a geostationary value.So any fixed text of image can be maintained in the average image 122, be constant because pixel value 120 is striden all images.

Removing to identify engine 112 is configured to define and is confirmed as relatively-stationary part in the average image 122.In one embodiment, be defined part 128 corresponding to the bright pixel value 120 in the average image 122.Remove to identify engine 112 and can keep to have a certain size or be positioned at and be defined zone 128 within other certain distances that are defined part 128, and discardable do not satisfy this requirement be defined zone 128.

In one embodiment, remove to identify engine 112 and be configured to define from the phase joined assemblies (component) of the average image 122 finding character 132 or the potential character in the average image 122, and produce character picture.Described phase joined assemblies can be comprising all pixels have the pixel region of bright pixel value 120.System 100 also can be configured to by using potential word 130 in optical character identification (OCR) the software identification the average image 122 and text to define word from equal image 122, to generate word image.Word 130 as described herein can comprise any combination of one or more characters 132.Based on word image and character picture, can keep and be defined part 128 so remove to identify engine 112, therein, being defined character 132 and overlapping from the word 132 that is defined of word image from character picture of certain predetermined percentage.Therefore, largely be defined the phase joined assemblies that is defined that word 130 or word components overlap and be maintained in the phrase image.

Template generator 114 uses the phrase images to generate template 126, to be used for the identification of going at the image sequence 116 of particular visual media data set 118.Being defined part 128 and can being defined the relevant position 134 of part in the average image 122 with each and being included in the template 126 in the phrase image 116.In certain embodiments, before generating template 126, the described part 128 that is defined can be modified based on the manual user input.In certain embodiments, the aspect of improving operation can be selected but automatic or automanual based on the average further inspection of pixel and/or user.In addition, in certain embodiments, template can template be generated and initially come into operation after a period of time further improved.Can use obfuscation in template 126 each image sequence 116 in viewdata set 118 to define content in the part 128.

In certain embodiments, extract picture material and the content of text that is defined in the part.Then, identify text message and it is grouped in the semantic entity by the frame that defines of analyzing character and word.In other embodiments, can realize other operations and/or analyze and to extract and to identify content of text.

The visual media that Fig. 2 shows Fig. 1 removes the synoptic diagram of an embodiment of recognition system 100.The image sequence 116 that goes recognition system 100 to receive corresponding to visual media data acquisition 118.Go recognition system 100 to comprise image combiner 110, be used for and merge into the average image 122 from the image of image sequence 100.Image combiner 110 can use any method merging from the creation of image the average image of image sequence 116.Relative fixed or constant assembly in whole or some image in the average image displayable image sequence 116.

After the image in the image sequence 116 is merged into the average image 122, go to identify the part that engine 112 finds to comprise probably in the average image 122 text or word 130, and use the position of defining frame or other visual confining method retrtievals.Can import 200 based on the user improves and is defined part 128 to keep or to remove the frame that defines in the average image 122.

Template generator 114 uses the average image 122 to remove recognition template 126 based on part 128 generations that are defined in the average image 122 then.This template 126 can be used for by in the obfuscation image sequence 116 corresponding to the recognition image sequence 116 of making a return journey of the content that is defined part 128 and relevant position thereof in the average image 122.Can by described content is removed from image, with other guide replace described content, make described content fuzzy or otherwise revise described in its part perhaps, come the described content of obfuscation.In certain embodiments, can use several fuzzy methods to come the obfuscation content, to guarantee to go completely identification.To the checking authority of this content and to the certain user as seen in certain embodiments, can be according to by the content of obfuscation.System 100 is configured to import 200 based on the user and sets up being defined the authority of checking of part 128, thereby checks that authority determined that the content that when is defined part 128 can check for given user.For example, the content of given area can be checked a user, and another user is blured.In addition, the different content in the image can comprise for user's difference and checks authority.

Template 126 also can be used for identifying other image sequences 116 in the visual media data acquisition 118.In one embodiment, image sequence 116 in the visual media data acquisition 118 has similar or identical layout (geographic layout), thereby the location matches of the sensitive data in each image in the position that is defined part 128 in the template 126 and other image sequences 116, this allows the image sequence 116 in the visual media data acquisition 118 to be gone identification fast and effectively.

Fig. 3 illustrates for generating the process flow diagram of an embodiment that visual media goes the method for recognition template 126.Although method 300 is to describe in conjunction with the recognition system 100 of going of Fig. 1, the embodiment of method 300 can use the recognition system 100 of going of other types to implement.

In one embodiment, go recognition system 100 that image sequence 116 is merged into single the average image 122.For example, the average image 122 can anyly can be described this image sequence 116 in an image any way obtains, and this image has pixel value average on all images of this image sequence 116 120.Noise 304 also can be removed by this system 120 from the average image 122, more easily determine the part that will be defined of the average image 122 to allow recognition system 100.

Go recognition system 100 to draw 306 characters around the phase joined assemblies in the average image 122 or in the copy of the average image 122 then and define frame 400, to obtain character picture 308.Go recognition system 100 also to draw 310 words around the word in the average image 122 or in the copy of the average image 122 130 and define frame, to obtain word image 312.In a plurality of embodiment, go to carry out in the process of process that recognition system 100 can be at the same time or order and define operation automatically.

The character picture that produces then and word image 312 can be used for finding to comprise in the average image 122 the most probable part of text.In one embodiment, go recognition system 100 to keep 314 parts that are defined that comprise one or more character frames of completely or partially overlapping with one or more word frames.For example, merge with the most of or portion of closing of a word frame or a plurality of character frames of overlapping can make the delimited area of recognition system 100 reservations corresponding to described character frame and word frame.The number percent of the coincidence between character frame and the word frame can allow draw characters image and word image 312 among both define frame 400 time some error, and still keep be defined the height possibility that the zone comprises text.In certain embodiments, go recognition system 100 to define phrase based on the distance between the delimited area.Can use by the delimited area that keeps the character/word frame generation that overlaps and create phrase image 316.This phrase image 316 can be used for generating the template 126 be used to removing to identify described image sequence 116 or other image sequences 116.

Fig. 4 shows the synoptic diagram of an embodiment of the average image 122 of Fig. 1.Although described in conjunction with the average image 122 of Fig. 4 here and gone recognition system 100, gone recognition system 100 to use in conjunction with any the average image 122.

In one embodiment, remove the phase joined assemblies in the recognition system 100 discovery the average images 122, and define frame 400 around the drafting of phase joined assemblies.The phase joined assemblies can be that the text 132 that links to each other in one group, image artifacts (artifact) or other have a plurality of zones of enlivening pixel region.Character defines algorithm can define frame 400 around drawing greater than pre-sizing or the assembly that links to each other that satisfies certain other threshold values, and this can help to reduce the error that detects the character 132 in the average image 122.Do not satisfy assembly or the pixel of threshold value in the average image 122 and can stay in the average image 122, maybe can from image, remove to remove noise or unnecessary assembly.Can be by indicating clear visual separation between the phase joined assemblies in the average image 122 defining leaving space between the frame 400 around the frame 400 that defines that the assembly that links to each other is drawn.

In one embodiment, during phase joined assemblies in defining the average image 122, the user can manually improve the position of defining frame 400 or the quantity in the character picture 308.The user can remove In the view of the user obviously be non-character element define frame 400.Perhaps, the user can be around the phase joined assemblies hand drawn frame that goes recognition system 100 to miss.

Fig. 5 shows the synoptic diagram of an embodiment of the average image 122 of Fig. 1.Although described in conjunction with the average image 122 of Fig. 5 here and gone recognition system 100, gone recognition system 100 to use in conjunction with any the average image 122.

In one embodiment, go recognition system 100 to find word 130 in the average images 122, and draw around word 130 or the zone that comprises text probably and to define frame 400.It is text filed to go recognition system 100 can use the OCR engine to find.The OCR engine can determine that each zone in the average image 122 has the value of the confidence of text.If the value of the confidence satisfies certain threshold value, this zone is considered to the candidate, draw for this zone and define frame 400, and produce respectively define the word image 312 that frame 400 is configured for the average image 122.

In one embodiment, during text filed in defining the average image 122, the user can manually improve the position of defining frame 400 or the quantity in the word image 312 that produces.The user can remove In the view of the user obviously be non-word or text component define frame 400.Perhaps, the user can be the regional hand drawn frame that recognition system 100 is missed.

Fig. 6 shows the synoptic diagram that visual media removes an embodiment of recognition template 126.Although described in conjunction with the average image 122 of Fig. 6 here and gone recognition system 100, gone recognition system 100 to use in conjunction with any the average image 122.

In certain embodiments, frame 400 is defined in the assembly that is not defined in character picture 308 drafting of going recognition system 100 can center in the word image 312.Otherwise frame 400 is defined in the assembly that does not define in character picture 312 drafting of going recognition system 100 can center in the character picture 308.By compare string image 308 and word image 312, go recognition system 100 can determine which frame most possibly comprises text.

Go recognition system 100 can keep from character picture 308 with from word image 312 define that frame 400 overlaps define frame 400.In certain embodiments, one or more character frames can overlap with one or more word frames.For example, single character frame can fully or substantially fully overlap with the word frame, thereby but remove recognition system 100 reserved character frames or word frame, perhaps can generate the new delimited area in the merging zone that comprises character frame and word frame.In another example, a plurality of character frames can fully or substantially fully overlap with the word frame, thereby but remove recognition system 100 reserved character frames, word frame or comprise the new delimited area in the merging zone of all character frames and word frame.In another example, a plurality of character frames can fully or substantially fully overlap with a plurality of word frames, thereby but remove recognition system 100 reserved character frames, word frame or comprise the character frame and one or more delimited area in the merging zone of word frame in some or all.Go recognition system 100 to keep the zone that is defined according to other combinations of character frame and word frame or other embodiment that do not describe here.

Can be used for generation then by the delimited area of going recognition system 100 to keep and remove recognition template 126.The content that is defined the position 134 definable templates of zone in phrase image 316 will be by the part of obfuscation, and described phrase image 316 can be the average image 122, has wherein drawn in some zone of the average image 122 and has defined frame 400.When template 126 is applied to image in the image sequence 116, go recognition system 100 can seek in the image zone corresponding to the position that is defined zone 128 of template 126, and these zones in the obfuscation image automatically.In certain embodiments, template 126 can be applied to Streaming Media, thereby can use template 126 to handle the live video record, to remove sensitive data from live video.

Fig. 7 shows the process flow diagram be used to an embodiment of the method 700 of going to identify the visual media data.Although the recognition system 100 of going in conjunction with Fig. 1 has been described method 700, the embodiment of method 700 can use the recognition system 100 of going of other types to realize.

In one embodiment, go recognition system 100 to merge 710 from the image sequence 116 of visual media data acquisition 118 and be the average image 122.The average image 122 can be created by striding average 705 pixel values 120 of all images in the image sequence 116 and filter out 715 pixel values 120 that do not satisfy preset frequency threshold value 124 from the average image 122 by system 100.

In one embodiment, system 100 defines the relatively-stationary part that is confirmed as of 720 the average images 122.Each is defined part 128 and can be identified by the relevant position 134 in the average image 122.In one embodiment, the part that defines the average image 122 comprises defining from the phase joined assemblies of the average image 122 and finds that character 132 is to produce character picture 308.The part that defines the average image 122 comprises and defining from the word of the average image 122 to produce word image 312.This can comprise that the part of analyzing the average image 122 comprises the value of the confidence of text with the part that obtains to analyze.The part of analyzing can be analyzed by the OCR engine.In one embodiment, this OCR engine is adapted to especially with going recognition system 100 to use.System 100 can satisfy the word threshold value in response to definite the value of the confidence, determines that the part of analyzing is word candidates.System 100 can keep such part 128 that is defined then, wherein being defined character and overlapping from the word 130 that is defined of word image 312 from character picture 308 of predetermined percentage.In one embodiment, system 100 merges to be positioned at mutually and is defined part 128 to form phrase within the predetermined mean distance.A plurality of phrases can be defined at together, so that it is more efficient to be defined the obfuscation in zone.

System 100 can generate then and comprise that being defined part 128 is defined the template 126 of the position 134 of part 128 in the average image 122 with each, thereby has kept the position that is defined part 128 134 in the average image 122 in template 126.In certain embodiments, the average image 122 with part of being defined 128 can be used as template 126.In other embodiments, can generate new template file based on the average image 122.In one embodiment, being defined part 128 and can importing 200 based on the user and manually improved 725 in template 126 or the average image 122.The user can determine to be included in one in template 126 or the average image 122 and be defined part 128 and do not correspond to sensitivity or security information, and this can be defined frame 400 and remove from template 126.In another embodiment, the user determines that the part corresponding to sensitive data of the average image 122 is not defined, and responsive drafting that the user can center in template 126 or the average image 122 defined frame 400 partially manually.The user also can set up the authority of checking that is defined part 128 for the average image 122, and the described authority of checking determines when that being defined part 128 can check for given user.

Then, can use 735 templates 126 going to identify for the image sequence 116 that generates template 126, this is to realize by the content corresponding to the position that is defined part 128 of template 126 in the obfuscation image.This all images that can allow to stride in the image sequence 116 carries out consistent obfuscation.In one embodiment, template 116 is used to identify other image sequences 116 in the visual media data acquisition 118 then.Described other image sequences 116 can be shared adjacent characteristics with the image sequence 116 that is used for generation template 126, such as the placement position of the text in each image, object or other assemblies.In certain embodiments, system 100 can be each visual media data acquisition 118 and generates a template 126, thereby system 100 can go recognition image sequence 116 for each different viewdatas set 118 automatically.

Although system 100 given here and method are relevant to and identify the visual media data and describe, but system's 100 methods also can be used for identifying the data of text data or other types.

The embodiment of phrase match system 100 comprises at least one processor that directly or indirectly is coupled by system bus and memory component such as data, address and/or control bus.The local storage, mass storage and the cache memory that use the term of execution that memory component can being included in program code actual, cache memory provides the interim storage of some program code at least, so as to reduce in the process of implementation must be from mass storage the number of times of retrieval coding

Should also be noted that to use for some operation at least of described method and be stored in computer-usable storage medium so that the software instruction of being carried out by computing machine is realized.As example, the embodiment of computer program comprises the computer-usable storage medium of storing computer-readable program, when this computer-readable program is carried out on computers, make to comprise this computing machine executable operations from the operation of electronic document information extraction.

Although the operation of the method here is shown and described with particular order, the order of the operation of each method can be changed, thus some operation can reverse order carry out, thereby perhaps some operation can be carried out with other operations at least in part concomitantly.In another embodiment, the instruction of different operating or child-operation can be intermittently and/or the mode that replaces implement.

The present invention can take complete hardware embodiment, complete software embodiment or comprise the form of the embodiment of software and hardware both elements.In one embodiment, the present invention realizes that in software described software includes but not limited to firmware, resident software, microcode etc.

In addition, embodiments of the invention can take can by computing machine can with or the form of the computer program of computer-readable medium visit, it provides by computing machine or any instruction execution system and has carried out or the program code of carrying out associated with it.For this illustrative purposes, computing machine can with or computer-readable medium can be can comprise, storage, transmission, propagation or transmission procedure be in order to carried out or any device of carrying out associated with it by instruction execution system, device or equipment.

Computing machine can use or computer-readable medium can be electronics, magnetic, light, electromagnetism, infrared or semiconductor system (or device or equipment), or is propagation medium.The example of computer-readable medium comprises semiconductor or solid-state memory, tape, dismantled and assembled computer format floppy, random-access memory (RAM), ROM (read-only memory) (ROM), rigid magnetic disks and CD.The current example of CD comprises compact disc read-only memory (CD-ROM), read-write CD (CD-RW) and digital video disc (DVD).

I/O or I/O equipment (including but not limited to keyboard, display, pointing apparatus) can be directly or the I/O controller by the centre be connected to system.In addition, network adapter also can be connected to system, so that data handling system can be connected to other data handling systems or remote printer or memory device etc. by special use or the public network of centre.Modulator-demodular unit, cable modem and Ethernet card only are several examples of current available network type of adapter.

The specific detail of a plurality of embodiment is provided in the above description.Yet some embodiment can be with being implemented, and be less than all these details.In other cases, for concise and to the point and clear for the purpose of, description the level of detail of some method, process, assembly, structure and/or function is limited to realize a plurality of embodiment of the present invention.

Although described and shown specific embodiment of the present invention, the present invention is not limited to particular form or the arrangement of the parts so describing and illustrate.Scope of the present invention will be limited by claims and equivalent thereof.

Claims

1. method that is used for going to identify the visual media data comprises:

To merge into the average image from the image sequence of visual media data acquisition;

Define the relatively-stationary part that is confirmed as of the average image, wherein each is defined partly and is identified by the relevant position in the average image;

Generation comprises that each that be defined part and be used for the average image is defined the template of the relevant position of part; And

By obfuscation be defined the part content make a return journey identification described image sequence.

2. the process of claim 1 wherein, merge image sequence and also comprise:

The all images of striding in the image sequence comes average pixel value, to obtain the average image; And

From the average image, filter out the pixel value that does not satisfy the preset frequency threshold value.

3. the method for claim 1 also comprises:

Applying template is to remove to identify other image sequences in the described visual media data acquisition.

4. the process of claim 1 wherein that the relatively-stationary part that is confirmed as that defines the average image also comprises:

Define from the phase joined assemblies of the average image to find character and to produce character picture;

Define from the word of the average image to produce word image; And

Keep to be defined part, described being defined character and overlapping from the word that is defined of word image from character picture that is defined predetermined percentage in the part.

5. the method for claim 4, wherein, the word that defines from the average image also comprises:

Analyze the part of the average image to obtain the degree of confidence that analyzed part comprises text; And

Satisfy the word threshold value in response to definite degree of confidence, determine that the part of analyzing is word candidates.

6. the method for claim 1 also comprises:

Merge and be positioned at being defined partly to form phrase within the predetermined mean distance mutually.

7. the method for claim 1 also comprises:

Import the part that is defined of improving the average image based on the user; And

Import the authority of setting up for being defined part of checking from the user, wherein, the described authority of checking determines when that being defined content partly can check for given user.

8. a visual media goes recognition system, comprising:

The image combiner, it is configured to and will merges into the average image from the image sequence of visual media data acquisition; And

Remove to identify engine, it is configured to:

9. the system of claim 8, wherein, described image combiner also is configured to:

10. the system of claim 8 also comprises template generator, and it is configured to:

11. the system of claim 8, wherein, the relatively-stationary part that is confirmed as that defines the average image also comprises:

Define from the word of the average image to produce word image; And

12. the system of claim 11, wherein, the word that defines from the average image also comprises:

13. the system of claim 8 wherein, describedly removes to identify engine and also is configured to:

Import the part that is defined of improving the average image based on the user.