GB2500264A

GB2500264A - Removing or obscuring sensitive medical image

Info

Publication number: GB2500264A
Application number: GB1204686.8A
Authority: GB
Inventors: Alexander Gorban; Dmitry Matsypaev; Colin Lamond
Original assignee: BVXL Ltd
Current assignee: BVXL Ltd
Priority date: 2012-03-16
Filing date: 2012-03-16
Publication date: 2013-09-18
Also published as: GB201204686D0; WO2013136093A3; WO2013136093A2

Abstract

A method of storing image data comprising images of a subject generated by a medical imaging device is disclosed. The method comprises capturing the image data, receiving subject identification metadata, analysing at least one selected element of the image data to detect features identifying the subject and modifying these by removing or obscuring any such detected features. The modified image is then stored as a subject record together with the subject identification metadata. The modified image may then be shared or transmitted without breaching data security protocols. The images may be ultrasound images captured using the DICOM protocol and the analysis of the image may involve an edge mask possibly generated by the use of a Canny edge detector and/or trapezoidal fuzzy numbers.

Description

1

IMAGE DATA STORAGE AND SHARING

This invention relates to a method of storing image data comprising images of a subject generated by a medical imaging device and to a method of sharing image data comprising images of a subject generated by a medical imaging 5 device. It also relates to a method of detecting text features in image data.

Whilst the invention relates to any medical imaging device such as X-ray, computerised tomography (CT), and magnetic resonance imaging (MRI) scanners, it finds particular use with ultrasound scanning equipment.

Ultrasound scanning is used for a wide variety of medical imaging purposes, in 10 particular for obtaining images of foetuses during gestation to monitor their prenatal development. With legacy ultrasound equipment, and indeed with modern portable equipment, the images are transitory, simply being displayed on a monitor screen whilst the ultrasound probe is in contact with the patient. Many scanners allow a record of a static image to be made on a thermal 15 printer and some allow a video to be made on a DVD or stored in a video file format on a memory stick. Image data can also be extracted from some ultrasound scanners digitally, for example using the Digital Imaging and Communications in Medicine (DICOM) standard.

It is desirable for the images generated by an ultrasound scanner to be stored 20 and shared for a variety of reasons. Firstly, clinicians may wish to make use of the images for diagnostic purposes after the scan has been taken, to share them with colleagues (possibly in other clinics) to obtain a second opinion, and for teaching students. Another popular use is by the patients themselves, who are often keen to share the images from an ultrasound scan of an unborn child 25 with family and friends.

There are problems with the current approaches to this, however, from a data security perspective. Images from an ultrasound scan are normally labelled with text identifying the patient, including for example, their name, date of birth, patient identification number, and other similar items. It is a breach of data 30 security protocols in many countries for images bearing such information to be

2

stored and/or distributed to third parties without appropriate consent being obtained from the patient beforehand. This limits the possibilities for using such images for clinical and educational purposes as mentioned above without laborious editing of the images to remove the identifying text.

5 From the patient's perspective, whilst they are free to distribute the images as they wish, the use of hard copy media is simply inefficient to share with family and friends who may live significant distances from and seldom see the patient.

In accordance with a first aspect of the invention, there is provided a method of 10 storing image data comprising images of a subject generated by a medical imaging device, the method comprising:

a) capturing the image data;

b) receiving subject identification metadata;

c) analysing at least one selected element of the image data to detect features 15 identifying the subject and modifying the or each selected element of the image data by removing or obscuring any such detected features; and d) storing a subject record comprising the or each modified selected element of the image data and the subject identification metadata.

By storing a record that comprises image data from which features identifying 20 the subject have been removed, it is possible to distribute the image data for educational and clinical purposes without breaching data security protocols. The above-mentioned problems are thereby overcome.

The features identifying the subject are usually text features, such as graphical text or a visual representation of text or a caption including text.

25 The image data may be captured either directly from the medical imaging device or indirectly via an intermediate device.

The steps of the method are carried out by one or more computer devices, For example, step (c) is carried out automatically using a computer device.

3

In a preferred embodiment, the medical imaging device is an ultrasound scanner.

The subject record typically further comprises a node identification number allocated to the medical imaging device for uniquely identifying the medical 5 imaging device.

The subject identification metadata preferably comprises one or more of the subject's name, the subject's e-mail address and a unique identification number allocated to the subject.

The or each selected element of the image data may comprise at least one 10 video object selected by a user. For example, this video object may be a video file captured from the medical imaging device in one or more of a variety of formats, such as DV or MPEG-4.

The or each selected element of the image data may comprise at least one still image object selected by a user. For example, this still image object may be an 15 image file captured from the medical imaging device in one or more of a variety of formats, such as JPEG.

Typically, step (d) further comprises transmitting the subject record to a remote server. In a preferred embodiment, the server is accessible on the Internet so that the subject can access their subject record for sharing purposes.

20 Preferably, the method further comprises constructing a manifest data object, which specifies the or each modified selected element of the image data included in the subject record and including the manifest in the subject record. This enables a straightforward way of validating the subject record on storage and subsequently as explained below. The manifest is typically a list, for 25 example stored in a file, of the video and/or still image objects that comprise the image data.

In a preferred embodiment, step (c) comprises:

d) forming an edge mask comprising only edge pixels in the image data;

4

e) for each row of the edge mask, forming a data structure representing candidate text elements consisting of the start and end positions in the row of contiguous horizontal edges; and f) for each data structure representing candidate text elements, calculating 5 confidence values for the data structure and for each portion of the data structure on either side of the largest gap between adjacent contiguous horizontal edges; and either i) replacing the data structure with two data structures, each consisting of one of the portions of the data structure on either side of the largest gap and

10 then repeating step (c), if the confidence values do not meet a predetermined set of confidence criteria; or ii) adding the data structure to a data structure representing detected text elements.

Tthe edge mask is preferably formed in step (d) by applying an edge detection 15 algorithm, such as Canny edge detector, to the image data.

The method may further comprise performing an adaptive thresholding algorithm on the edge mask prior to step (e) such that the edge pixels in the edge mask have a first binary value, all other pixels having a second binary value.

20 The start position of a contiguous horizontal edge may be detected in step (e) by detecting the transition along a row between pixels having the second binary value to pixels having the first binary value and/or by detecting a pixel having the first binary value in the left-hand most position along the row.

The end position of a contiguous horizontal edge may be detected in step (e) 25 by detecting the transition along a row between pixels having the first binary value to pixels having the second binary value and/or by detecting a pixel having the first binary value in the right-hand most position along the row.

The method preferably further comprises detecting regions of connected pixels having the first binary value and removing detected regions from the edge 30 mask that do not meet predefined size criteria. Typically, the size criteria

5

include the height and aspect ratio of a minimum containing rectangle around a detected region. This not only speeds up the processing but improves the quality of the results of text element detection as irrelevant data is not included in the processing, reducing the likelihood of false detection.

5 In one embodiment, the confidence values do not meet the predetermined set of confidence criteria if:

i) either the confidence value for the data structure is less than a first threshold and the if the confidence value for either portion of the data structure on either side of the largest gap exceeds the confidence value for the data structure; or 10 ii) all of the confidence values are below a second threshold.

Typically, the method further comprises forming an image mask from the data structure representing detected text elements by setting pixels in the image mask to a value depending on the confidence value for each data structure representing candidate text elements added to the data structure representing 15 detected text elements.

In this image mask, the pixels in the image mask may be set to the values depending on the confidence value for each data structure representing candidate text elements by multiplying 255 by the confidence value. The confidence value typically ranges from a value of 0 to 1 and therefore this 20 procedure embeds the confidence value in the image mask.

Typically, the method further comprises removing connections between rows of pixels in the image mask that fall below a threshold value and/or removing gaps between rows of pixels in the image mask that fall below a threshold value.

25 The method may further comprise performing a thresholding algorithm on the image mask.

The method may further comprise performing a morphological dilation algorithm on the thresholded image mask.

6

Typically, the method further comprises removing or obscuring text elements in the image data by modifying pixels in the image data relating to detected text elements according to the data structure representing detected text elements.

The pixels are preferably modified by an inpainting algorithm.

5 In accordance with a second aspect of the invention, there is provided a method of sharing image data comprising images of a subject generated by a medical imaging device, the method comprising:

a) receiving a subject record comprising the image data and subject identification metadata, wherein the image data has been previously modified

10 to remove any features identifying the subject;

b) storing the subject record;

c) receiving login details from a user;

d) authenticating the login details to confirm that the user is the subject identified by the subject identification metadata; and

15 e) receiving a sharing command from the user, the sharing command indicating a selected one of a plurality of sharing services over which the image data is to be shared with a third party, and sharing the image data over the selected sharing service.

This method allows a straightforward way for a subject to share image data 20 generated by a medical imaging device, such as an ultrasound scanner, with friends and family. It is vastly more efficient than the distribution of hard copies on paper or DVD mentioned above. Furthermore, since the image data has been previously modified to remove any features identifying the subject there is no breach of data security protocols with this method, even where the 25 subject record is received at a third party server.

The steps of the method are carried out on a computer device.

7

In a preferred embodiment, the medical imaging device is an ultrasound scanner.

The image data may comprise at least one video object selected by a user. For 10 example, this video object may be a video file captured from the medical imaging device in one or more of a variety of formats, such as DV or MPEG-4.

The image data may comprise at least one still image object selected by a user. For example, this still image object may be an image file captured from the medical imaging device in one or more of a variety of formats, such as 15 JPEG.

Normally, the method further comprises validating the received subject record, prior to storing the subject record, by confirming that the subject record comprises all the image data specified in a manifest and/or the validity of the image data and/or the validity of a node identification number allocated to the 20 medical imaging device for uniquely identifying the medical imaging device. If validation fails, the subject record is usually placed in a queue for manual investigation.

The manifest is usually part of the subject record and is typically a list, for example stored in a file, of the video and/or still image objects that comprise 25 the image data.

Typically, step (b) comprises storing a unique subject record identification number in a database record along with the subject identification data and one or more uniform resource locators indicating the location of the image data in a

8

separate file system. The separate file system could be a cloud-based file system, such as Amazon's S3 service. It could alternatively be a local file system.

In one embodiment, the method further comprises transmitting the image data 5 from the subject record to one or more e-mail addresses specified in the subject record.

The sharing services typically comprise one or more social media services and/or e-mail. The skilled person will be aware of such social media services (e.g. Youtube, Twitter and Facebook) and how to share the image data over 10 these services using the application programming interfaces (APIs) provided by the operators of these services for that purpose.

In accordance with a third aspect of the invention, there is provided a method of sharing image data representing images of a subject generated by a medical imaging device, the method comprising a combination of a method 15 according to the first aspect of the invention followed by a method according to the second aspect of the invention.

In accordance with a fourth aspect of the invention, there is provide a system comprising one or more capture devices adapted to perform a method according to the first aspect of the invention, each of which is coupled to a 20 respective medical imaging device, in use, and a remote storage device adapted to perform a method according to the second aspect of the invention, the remote storage device and the or each capture device together forming a network.

In accordance with a fifth aspect of the invention, there is provided a method of 25 detecting text elements in image data, the method comprising:

a) forming an edge mask comprising only edge pixels in the image data;

b) for each row of the edge mask, forming a data structure representing candidate text elements consisting of the start and end positions in the row of contiguous horizontal edges; and

9

c) for each data structure representing candidate text elements, calculating confidence values for the data structure and for each portion of the data structure on either side of the largest gap between adjacent contiguous horizontal edges; and either 5 i) replacing the data structure with two data structures, each consisting of one of the portions of the data structure on either side of the largest gap and then repeating step (c), if the confidence values do not meet a predetermined set of confidence criteria; or ii) adding the data structure to a data structure representing detected 10 text elements.

This method provides a straightforward way of detecting text elements in image data, which finds a multitude of uses in image processing situations. One such use is to find text elements in image data comprising images of a subject generated by medical imaging devices, such as an ultrasound scanner, 15 that might identify the subject. The text features are, for example, graphical text or a visual representation of text or a caption including text.

In one embodiment, each data structure representing candidate text elements is placed in a stack in step (b). Then, in step (c)(i), it is possible to replace the data structure with two data structures consisting of the portions of the data 20 structure on either side of the largest gap by popping the data structure representing candidate text elements and pushing the two data structures consisting of the portions of the data structure on either side of the largest gap onto the stack. In this way, the two data structures consisting of the portions of the data structure on either side of the largest gap are in the right place on the 25 stack (i.e. at the top) for calculation of their confidence intervals when step (c) is repeated.

The edge mask is typically formed in step (a) by applying an edge detection algorithm, such as Canny edge detector, to the image data.

Preferably, the method further comprises performing an adaptive thresholding 30 algorithm on the edge mask prior to step (b) such that the edge pixels in the

10

edge mask have a first binary value, all other pixels having a second binary value.

Typically, the first binary value represents white pixels and the second binary value represents black pixels. The largest gap between adjacent contiguous 5 horizontal edges is, in this case, the largest expanse of black pixels on a row between white pixels.

The start position of a contiguous horizontal edge may be detected in step (b) by detecting the transition along a row between pixels having the second binary value to pixels having the first binary value and/or by detecting a pixel 10 having the first binary value in the left-hand most position along the row.

The end position of a contiguous horizontal edge may be detected in step (b) by detecting the transition along a row between pixels having the first binary value to pixels having the second binary value and/or by detecting a pixel having the first binary value in the right-hand most position along the row.

15 The method preferably further comprises detecting regions of connected pixels having the first binary value and removing detected regions from the edge mask that do not meet predefined size criteria. Typically, the size criteria include the height and aspect ratio of a minimum containing rectangle around a detected region. This not only speeds up the processing but improves the 20 quality of the results of text element detection as irrelevant data is not included in the processing, reducing the likelihood of false detection.

In a preferred embodiment, the confidence values do not meet the predetermined set of confidence criteria if:

i) either the confidence value for the data structure is less than a first threshold 25 and the if the confidence value for either portion of the data structure on either side of the largest gap exceeds the confidence value for the data structure; or ii) all of the confidence values are below a second threshold.

Preferably, the method further comprises forming an image mask from the data structure representing detected text elements by setting pixels in the image

11

mask to a value depending on the confidence value for each data structure representing candidate text elements added to the data structure representing detected text elements

In this image mask, the pixels in the image mask may be set to the values depending on the confidence value for each data structure representing candidate text elements by multiplying 255 by the confidence value. The confidence value typically ranges from a value of 0 to 1 and therefore this procedure embeds the confidence value in the image mask.

Preferably, the method further comprises performing a thresholding algorithm on the image mask. This results in an image mask in which all pixels with a confidence value higher than the threshold are present (e.g. by making them white) whereas those with a confidence value lower than the threshold are not present (e.g. by making them black). It is straightforward then to identify the text elements in the original image data using this mask.

Typically, the method further comprises performing a morphological dilation algorithm on the thresholded image mask.

In a preferred embodiment, the method further comprises removing or obscuring text elements in the image data by modifying pixels in the image data relating to detected text elements according to the data structure representing detected text elements.

The pixels are typically modified by an inpainting algorithm.

An embodiment of the invention will now be described with reference to the accompanying drawings, in which:

Figure 1 shows a block diagram of a system for carrying out the invention;

12

Figure 2 shows a flow diagram of an implementation of the invention;

Figure 3 shows a flow diagram of a text removal technique; and

Figure 4 shows a detailed flow chart of one module used in the text removal technique for identifying text elements in image data.

In Figure 1, an ultrasound scanner 1 is shown. The ultrasound scanner 1 is connected to a video capture device 2. This comprises a computer device with a network connection for connection to the Internet 3. The video capture device 2 captures image data from the ultrasound scanner 1 based on user input received from a touch screen input device 5 as will be explained below. In one embodiment, the image data is captured in digital format over a network from the scanner, for example using the DICOM medical imaging information exchange format. In another embodiment, the image data is captured by way of an analogue-to-digital video converter, such as the ADVC55 from Grass Valley USA, LLC. This received analogue video directly from the ultrasound scanner 1, for example in S-Video or composite video formats, and converts it to digital form, for example DV format, for further processing by the computer device.

A second ultrasound scanner 6 is also shown in Figure 1 along with a respective video capture device 7 and touch screen input device 8. These are identical to the ultrasound scanner 1, video capture device 2 and touch screen input device 5. They may be situated in the same hospital or clinic as ultrasound scanner 1 and its appended video capture device 2 or in another, totally unrelated hospital or clinic. They are shown merely to illustrate that the invention is scalable for use with an unlimited number of ultrasound scanners. The only difference is that each video capture device 2 is programmed with a unique node identification number when it is installed. This serves the purpose of being able to track the source of captured image data to a particular ultrasound scanner 1.

Also shown in Figure 1 are a laptop 9 and a server 10, the function of which will be explained below.

13

Figure 2 shows a block diagram of the method performed by the video capture device 2 (or 7). All of the interaction with a clinician or other user is performed using the touch screen input device 5 (or 8). The method starts in step 20 when a clinician logs. This is done in one of the conventional ways, for example using a username and password. Assuming that the login is successful, the clinician enters the patient identification details into the touch screen input device 5. The patient identification details may be the patient's name or a number assigned to them, for example a patient number allocated by the hospital to that particular patient. The patient identification details may be entered manually using a keyboard displayed on the touch screen or by scanning a barcode printed on the patient's notes.

A warning message may also be displayed in step 21 to remind the clinician to advise the patient that their personal data will leave the control of the hospital or clinic during the process. The patient may also be required to confirm their acceptance of this by entering a secret password they have previously been allocated for this purpose.

During the ultrasound scan, the video capture device 2 captures all the video images output by the ultrasound scanner 1. As mentioned above, this is captured from the ultrasound scanner 1 either digitally, for example using the DICOM protocol, or using an analogue-to-digital video converter. The resulting image data is displayed in step 22 on the touch screen input device 5 to the clinician and/or patient using video player software running on the computer device within the video capture device 2. The clinician and/or patient can then, in step 22, select either the whole captured video sequence or portions of it or both. Each selected portion may be either a section of video or a still image.

The clinician may then enter, in step 23, one or more e-mail addresses to which the selected portions of the image data or a notification that the selected portions are available for sharing should be sent in due course. These e-mail addresses will typically be the patient's e-mail address and the clinician's e-mail address. They may also be a predefined group of e-mail addresses identified by a group identifier.

14

In a typical embodiment, the captured image data is video image data in DV format. The software running on the computer device within video capture device 2 extracts, in step 24, a JPEG file for each selected still image portion and a DV file for each selected video sequence. In step 25, a metadata file is 5 constructed. This is a text file indicating the name and e-mail address of the patient, the node identification number allocated to the ultrasound scanner 1, the clinician's identification number (e.g. their username entered above), the start and end time of the scan, a manifest of all the files for the still images and video sequences selected, and any e-mail addresses selected in step 23.

10 Each of the DV and JPEG files is then subjected, in step 26, to an image processing method for removing or obscuring any text features present in the image data that could be used to identify the patient. This will be explained in detail below with reference to Figure 3. The image data is, as a result of this method, anonymised so that it can be sent from the hospital where the 15 ultrasound scanner 1 is located without breaching data security protocols.

In step 27, each of the DV files is converted to MPEG-4 and the resulting bundle of MPEG-4 files, JPEG files and text metadata file is zipped, for example using Lempel-Ziv or similar method. The conversion to MPEG-4 and zipping are carried out to compress the data.

20 The zipped bundle of files is then transmitted over the Internet 3 to a remote server 10 in step 28. This is done using a file replication process, such as rsync, over a virtual private network (VPN), which is encrypted to protect the data in transit. One advantage of this technique is that the hospital only needs to open one TPC/IP port to enable the transmission. It is therefore relatively 25 secure.

Finally, the zipped bundle of files and all the captured video data from ultrasound scanner 1 is deleted in step 29 so that no local copy remains.

In step 30, the remote server 10 receives the transmitted bundle of files and validates these. The validation process involves checking that each of the 30 JPEG and MPEG-4 files specified in the manifest is actually present in the

15

transmitted bundle of files and that it is not corrupted. It also involves checking that a valid node identification number has been included in the text metadata file (e.g. that the node identification number is one that has been allocated and is still in use).

Each of the MPEG-4 files is then converted to a variety of different formats to suit the different types of devices to which the video sequences might need to be shared. For example, the files are converted to appropriate formats to ensure that the video sequences are viewable on FlashRand HTML5 browsers, and on iPhoneRTand Androiasmartphones.

A database is then updated by inserting a new record in step 32. This record includes a unique subject record identification number allocated by the server 10 along with the subject identification data and a uniform resource locator (URL) indicating the location of each of the JPEG files and converted video files in a separate file system. In this embodiment, the separate file system is Amazon's S3 cloud-based system, although any other file system could be used. The unique subject record identification number is used rather than the patient's name or other identifying information so that if the system is compromised, there is no indication that the video or still image files correspond to any particular patient. At the same time as the database record is inserted, the JPEG and video files are stored in the locations referred to by the URLs in the database record.

In step 33, e-mails are sent to the patient and/or clinician if their e-mail addresses were included in the text metadata file. This e-mail will normally simply indicate that the images and video sequences are now available for sharing.

In step 34, a user logs in to start the sharing process. This will typically be a patient or clinician and they will log in using a username and password. A clinician would be granted access to any uploaded records that he or she is associated with, whereas a patient would only be granted access to their own

16

particular records. The system displays the accessible records and allows the user to select one of these.

Then, in step 35, the user selects an image file or video file and a sharing option for that. In the case of a clinician, they will only be allowed to download it or e-mail it to other authorised clinicians. In the case of a patient, they will be allowed to share it via e-mail or a social networking service, such as Youtube, Twitter or Facebook. The server will interface with the selected service using the APIs provided for uploading data to these services. In this way, the patient can easily share the image and/or video files with their friends and family.

Figure 3 shows the image processing method used to detect and remove text elements in the image data. The method starts in step 40 by loading a file of image data for processing. In step 41, the file is analysed to determine whether it is a video object, a still image object or an unknown format. If it is an unknown format, processing finishes. If it is a still image object then processing continues in a still image processing branch commencing with step 42, whereas if it is a video object then processing continues in a video processing branch commencing with step 43. The two processing branches make use of similar techniques. Indeed, the modules used in the still image processing branch are a subset of those used in the video processing branch. However, the video image processing branch is more complicated, operating over two passes, to cater for the more complicated nature of a video sequence.

The still image processing branch will be described first. In step 42, a pair of predefined masks is loaded. These masks may be created by a user to define areas of image data that they know will always contain text and that they know will always be free of text. A positive image mask indicates areas (in white) where it is assumed that there will always be text, whereas a negative mask indicates areas (again in white) where it is assumed that there will never be text. The use of these masks is optional and they can simply be left empty (i.e. black) if not required.

17

In step 44, a module is used to detect text features in the still image. The detailed operation of this module will be described below with reference to Figure 4. In the meantime, it suffices to say that it returns an image mask indicating areas (in white) where text elements have been detected in the image data. In step 45, the image mask returned in step 44 is modified using the predefined image masks loaded in step 42 by combining the image mask of step 44 with the positive image mask and the complement of the negative image mask. This ensures that all areas where a user has indicated that there are always text elements are included on the resultant image mask and that the resultant image mask does not include areas where a user has indicated that there are never text elements.

The text elements in the original still image data corresponding to the areas of text indicated in the resultant image mask are then removed in step 46 using an inpainting algorithm, for example from the OpenCV library. This inpainting procedure obscures the detected text using pixels from areas around the detected text. This makes the obscuring relatively unnoticeable. The modified image data is then saved to replace the original still image data.

In the video processing branch, the predefined masks are loaded in step 43. This is identical to step 42 in the still image processing branch and need not be described further. In step 47, a set of history accumulators are initialised for use later. Their purpose will be explained below.

In the first pass, processing continues in a loop around steps 48, 49 and 50 until all frames of the video sequence have been processed. In step 48, the next frame in the sequence is loaded. In step 49, text features in the frame are detected using the same module as in step 44 of the still image processing branch. The detailed operation of this module will be described below with reference to Figure 4. In step 50, the resultant image mask returned by step 49 is added to the history accumulators initialised in step 47.

After the first pass is complete, the history accumulators are analysed in step 51. This looks for small anomalies in the image masks between frames.

18

Specifically, it detects where the masks for single frames or small groups of frames in the video sequence indicate the existence of text elements when frames either side do not. It also detects where the masks for single frames or small groups of frames in the video sequence indicate the absence of text elements when frames either side indicate they are present. These anomalies are removed by modifying the masks either to remove the spurious indication of text elements or to include the text elements where they are spuriously absent.

In the second pass, a loop of steps 53, 54, 55 and 56 operates over each frame in the video sequence in turn. In step 53, the next frame in the sequence is loaded. In step 54 the corresponding image mask modified in accordance with the history accumulators in step 51 is loaded and modified using the predefined masks in step 55. The predefined masks are used in precisely the same way as in step 45. Then in step 56, the inpainting procedure (discussed above with reference to step 46) is used on the frame loaded in step 53 to remove detected text elements in accordance with the image masks modified as appropriate in step 55. The modified video sequence is then saved to replace the original. Once the second pass is complete, the audio stream is copied from the original video sequence to the modified version in step 57.

The detection of the text elements in steps 44 and 49 will now be explained in more detail with reference to Figure 4, which shows the method performed by the module used in steps 44 and 49.

First, in step 60, the image data (which may represent a still image or a frame in a video sequence) is processed by an edge detection algorithm, such as Canny edge detector and then adaptively thresholded and transformed to a binary image. The binary image contains only black and white pixels. The white pixels are the pixels of the further interest.

Then connected regions of white pixels on the binary image are detected. These regions are analysed according to their height and aspect ratio, and those that do not meet predefined height and aspect ratio criteria are filtered

19

out. This leaves only regions that could conceivably contain text-like items and reduces the processing load required in the following steps of the algorithm. The aspect ratio is considered to meet the criteria if it exceeds a ratio of 2.5 (measured as the ratio of width to height). The height criteria is considered met if the ratio of the height of the connected region of pixels to the image height is within a range of 0.011 to 0.04.

Next, in step 61, a horizontal transform is applied to the remaining regions to pick out the high-frequency alternation between background and foreground pixels, which is a significant feature of text elements in image data. The transform is performed separately for each row of pixels in the image data. The transitions along the row from black to white pixels are detected. The white pixels adjacent to these transitions are marked as separators. Furthermore, white pixels in the left-hand most position and white pixels in the right-hand most positions are detected and marked as separators. Thus, each separator marks the beginning or end of a contiguous horizontal region along the row of pixels.

In step 62, a data structure is formed for each row by forming an array indicating the position (i.e. the column position) along the row of each of the separators. Thus, the data structure represents (by their location) candidate text elements consisting of the start and end positions in the row of contiguous horizontal edges. Each data structure is placed on a stack for further processing.

Two metrics relating to these data structures, known as "words", are calculated. The first is the length, which is equal to the number of separators minus 1. The second is the maximum gap, which is the maximum number of black pixels between two adjacent separators. Thus, the maximum gap represents the largest gap between adjacent contiguous horizontal edges in a row.

Confidence values for each "word" (i.e. data structure formed in step 62) and for each portion of each "word" either side of its maximum gap are then

20

calculated in step 63. Trapezoidal fuzzy numbers are used for this to determine the likelihood that each "word" (or portion) is part of a real word depending on the length and maximum gap calculated above. The trapezoidal fuzzy numbers are calculated from these two metrics, which are used because the length 5 correlates with the number of letters in real words typically found on ultrasound image and the maximum gap corresponds to the maximum distance found between letters in a real word. The confidence value of the "word" (or portion) is calculated using fuzzy set theory as the minimum value between two confidence criteria.

10 An explanation of how the confidence value of a "word" object is calculated follows. Let U be the universal set of the all possible "word" objects. Its subset X contains all text-like "word" objects on the given image. X is built as a fuzzy set. Thus, X can be represented as the pair <U, mx > where mx : U —»■ [0, 1] is the membership function, which determines the membership degree for each 15 element in U to the set X. Note that a fuzzy number is a special kind of fuzzy set where the universal set is the set of real numbers R. Also, the fuzzy numbers calculated should satisfy the requirements of continuity, convexity and normalization. These requirements are always satisfied with trapezoidal fuzzy numbers, which are used in the algorithm being described.

20 Because of the definition of X, only those "word" objects that are found on the given image are considered. To estimate /r?x(x) where x is an arbitrary "word" object, the two length and maximum gap characteristics of the "word" object are used.

Let Y = <U, mY > be the fuzzy set of "word" objects whose lengths satisfy a first 25 criterion to some degree, which is estimated as follows based on a trapezoidal fuzzy number y = <R, my >. The membership function my is determined on the set of real numbers and has the form of a trapezium, the shape of which is taken from assumptions about the "word" object's length on the image. Let len: U —► R be the function which returns the length for any given "word" object. 30 Then we estimate for any x from X the degree of satisfying the first criterion mY as mY ( x) = my (len (x)) .

21

Thus we estimate the membership degree of the "word" object to the fuzzy set Y using the membership degree of its length to the fuzzy number y.

Let Z = <U, mz > be the fuzzy set of "word" objects whose maximum gaps satisfy a second criterion to some degree, which is estimated as follows based 5 on a trapezoidal fuzzy number z = <R, mz >. Its membership function mz is determined from the set of real numbers and has a trapezoidal form, the shape of which is taken from assumptions about the gaps in "word" objects. Let max_gap : U —>■ R be the function which returns the maximum gap for any given "word" object. Then we estimate for any x from X the degree of satisfying 10 the second criterion mz as mz (x) = mz(max_gap( x)).

Thus there are two fuzzy sets Y and Z satisfying different criteria of text-like "word" objects. We require that both the criteria should be satisfied at the same time. This requirement corresponds to the operation of fuzzy sets intersection. Thus,X=YnZ.

15 According to the fuzzy set theory, intersection of two fuzzy sets can be calculated as follows. For any x from X the membership function value mX (x) is evaluated as mx( x) = mm (mY (x) ,mz (x)) .

The value mx(x) is the confidence value for x, where x is the "word" object.

For example, if a "word" object has 4 separators and a gap between two 20 contiguous edges of 2 pixels then the length of this "word" (the number of separators minus 1) is 3 and the maximum gap of this "word" is 2. Thus, len(x) = 3 and max_gap(x) = 2 where x is the "word" object being discussed. Assuming that y and z fuzzy numbers already defined in configuration settings used by the algorithm so that my (3)=0.45 and mz (2)=0.68, the total confidence 25 of x is equal to:

mx( x)=min (mY (x) ,mz (x))=min (my( len( x)), mz (max_gap( x)))=QA5 .

In step 64, an assessment is made as to whether the required confidence criteria are met. If they are not met then the data structure is replaced on the

22

stack by each portion of the data structure on either side of its maximum gap. In other words, the data structure ("word") is split. Processing then proceeds back to step 63 where the confidence values will be calculated again. However, this time the confidence values are calculated on the a first portion of the split "word" and the portion on either side of its maximum gap. This loop continues until the confidence criteria is met when the data structure from the stack is added in step 66 to an output array.

The confidence criteria are considered not to be met if either the confidence value for the data structure is less than a first threshold and if the confidence value for either portion of the data structure on either side of the largest gap exceeds the confidence value for the data structure; or if all of the confidence values are below a second threshold. Suitable values for the threshold in a general text detection processing, for example suitable for use in detecting text on ultrasound scan image data, are 0.75 for the first threshold and 0.25 for the second threshold.

This algorithm can be summarised in pseudo-code as follows:

For each row of pixels implement the following instructions:

1. allocate stack structure STACK to store "word" objects

2. allocate output vector OUT of "word" objects

3. build "word" object from the sequence of separators returned by horizontal transform for current row and PUSH it onto the STACK

4. initialise a confidence threshold value T1 (e.g. 0.75) (indicating a high enough confidence)

5. initialise a confidence threshold value T2 (e.g. 0.25) (T2 < T1) (indicating a confidence value that is too small)

6. WHILE STACK is not empty DO:

1. POP "word" W from the STACK

2. IF length (W) < 2

1.REMOVE W

2. next iteration

3. calculate confidence of W, conf(W)

4. BREAK "word" W on maximum gap to form left (L) and right (R) sub-

words

5. calculate confidence of L and R, conf(L) and conf(R)

6. IF [conf(W) < T1 AND MAX(conf(R), conf(L)) > conf(W)] OR

[MAX(conf(R), conf(L)) < T2 AND conf(W) < T2]

PUSH subwords L and R into the STACK

7. else PUSH W into OUT

8. go next iteration

The output array is then used to build an image mask in step 67. This is done by copying all pixels from the first separator to the last one to the corresponding places on the mask. The pixel values are set to be equal to 255 multiplied by the "word's" confidence. Initially, the mask is totally black. In other words, all its pixels have an initial value of zero, which is unchanged if not modified by copying pixels to the mask from the output array.

The image mask is then processed to remove vertical gaps and vertical connections between pixels in the mask with a non-zero confidence (i.e. non-black pixels) that fall below a threshold. Again, trapezoidal fuzzy numbers are used to determine whether the vertical gaps and connections fall below the threshold based on the assumption that text height typically lies within a certain range of values. The confidence value of a group of pixels is recalculated after a vertical gap or connection is removed. If the confidence value increases then the removal is deemed correct. Otherwise, it is reversed.

The removal of vertical gaps and vertical connections between pixels is explained below in more detail. As already mentioned, after the horizontal transform is completed, all the "word" objects are projected onto an image mask with their confidence values. Thus, there is a single channel image mask where the intensity value of the pixel corresponds to its confidence value. Black pixels correspond to a minimum confidence value (i.e. 0) and white pixels to a maximum confidence value (i.e. 1).

24

The goal of this stage is to remove false vertical connections and fill false vertical gaps on the mask. The first step is to transpose the mask to place the columns of the mask in rows. This is not an essential step, and is only performed to make subsequent calculations faster. The following sequence of 5 operations is executed for each row of the transposed mask separately.

A "column" object is formed containing the contiguous sequence of pixels from the row of the transposed mask along with the confidence values. The length of the "column" object is defined as the total number of pixels that are contained in it.

10 Let Uc be the universal set of "column" objects. Let coljen : Uc —> R be the function which for each c from Uc returns its length. Let Xc = < Uc , mxc > be the fuzzy set of "column" objects that satisfy a text height criterion to some degree determined by the membership function mXc■ Let xc = <R, mxC > be the trapezoidal fuzzy number whose membership function is defined from 15 assumptions made about the height of text areas on the image to search. Then a relationship is established between the "columns" objects set membership function mXc and the fuzzy number's membership function mxC for any c from Uc as follows:

mxc (c) = mxC(coljen (c)) .

20 A merge operation is then used on neighbouring "column" objects. If two "column" objects are neighbours on the same row of the transposed mask, the merge operation between them returns a new "column" object that satisfies the following requirements. First, the resultant "column" object contains all the pixels from both "columns" being merged and pixels which lie between them. 25 Second, the confidence values of pixels between "columns" being merged are assigned to the minimal confidence value among all the pixels of the "columns" being merged.

As an example of the "column" merge operation:

25

A row has the following sequence of pixels designated by their confidence values:

... 0 0.75 0.45 0.98 0.23 0 0 0 0 0 0.37 0.17 0.76 0.4 0 ...

The "column" objects are presented here as sequences of numbers in bold 5 text. The gaps between the "columns" are not in bold text. The gap between the two "columns" has a length equal to 5, both "columns" have a length equal to 4. The minimum confidence value between them is 0.17. the result of the merging operation on these "column" objects is:

... 0 0.75 0.45 0.98 0.23 0.17 0.17 0.17 0.17 0.17 0.37 0.17 0.76 0.4 0 ...

10 The new merged "column" object has a length equal to 4+4+5 = 13.

The pixels that previously belonged to the gap between the "columns" are assigned to the minimum confidence value among the pixels of the "columns" being merged.

Then a vertical transform algorithm is used on the rows of the transposed 15 mask. In this, initial "column" objects are first built from the assumption that all zero-pixels are parts of the gaps between "column" objects. For example, part of the row below:

... 0 0.75 0.45 0.98 0.23 0 0 0 0 0 0.37 0.17 0.76 0.4 0 ...

is broken on "columns" as:

20 ... 0 0.75 0.45 0.98 0.23 0 0 0 0 0 0.37 0.17 0.76 0.4 0 ...

The "columns" above are marked with bold type.

Then an attempt is made to merge all neighbouring pairs of "columns" in the row. If the confidence value of the merging result is greater than the maximum confidence value in the "columns" being merged and the quantity of pixels 25 originally belonging to the "columns" being merged that still appear in the resultant "column" is greater than a threshold value then the result of the

26

merging operation is accepted and retained instead of the original pair of "columns". Otherwise, the merging result is declined and the original "columns" are retained.

Next, for all "columns" in the row each pixel's confidence is recalculated as the 5 minimum value of the current pixel's confidence and the confidence values of the "column" which it belongs to. For example, if for the "column" object:

... 0 0.75 0.45 0.98 0.23 0.17 0.17 0.17 0.17 0.17 0.37 0.17 0.76 0.4 0 ...

the confidence value is 0.45 then the confidence values of its pixels will be changed to

10 ... 0 0.45 0.45 0.45 0.23 0.17 0.17 0.17 0.17 0.17 0.37 0.17 0.45 0.4 0 ...

Next a thresholding procedure is used where pixels with too small a confidence value (lower than 0.011) are rejected.

The non-binary image mask of step 67 is then thresholded to turn it into a binary image mask in step 68. A morphological dilate operation is the 15 performed and the resultant image mask returned by the module. The resultant image mask can then be used to determine which pixels in the original image data should be obscured by the inpainting process referred to above. In this way, text elements can be detected and obscured to anonymise image data.

Claims

27 CLAIMS

1. A method of storing image data comprising images of a subject generated by a medical imaging device, the method comprising:

a) capturing the image data;

b) receiving subject identification metadata;

c) analysing at least one selected element of the image data to detect features identifying the subject and modifying the or each selected element of the image data by removing or obscuring any such detected features; and d) storing a subject record comprising the or each modified selected element of the image data and the subject identification metadata.

2. A method according to claim 1, wherein the subject record further comprises a node identification number allocated to the medical imaging device for uniquely identifying the medical imaging device.

3. A method according to claim 1 or claim 2, wherein the subject identification metadata comprises one or more of the subject's name, the subject's e-mail address and a unique identification number allocated to the subject.

4. A method according to any of the preceding claims, wherein the or each selected element of the image data comprises at least one video object selected by a user.

5. A method according to any of the preceding claims, wherein the or each selected element of the image data comprises at least one still image object selected by a user.

6. A method according to any of the preceding claims, wherein step (d) further comprises transmitting the subject record to a remote server.

7. A method according to any of the preceding claims, further comprising constructing a manifest data object, which specifies the or each modified selected element of the image data included in the subject record and including the manifest in the subject record.

28

8. A method according to any of the preceding claims, wherein step (c) comprises:

d) forming an edge mask comprising only edge pixels in the image data;

e) for each row of the edge mask, forming a data structure representing 5 candidate text elements consisting of the start and end positions in the row of contiguous horizontal edges; and f) for each data structure representing candidate text elements, calculating confidence values for the data structure and for each portion of the data structure on either side of the largest gap between adjacent contiguous

10 horizontal edges; and either i) replacing the data structure with two data structures, each consisting of one of the portions of the data structure on either side of the largest gap and then repeating step (c), if the confidence values do not meet a predetermined set of confidence criteria; or 15 ii) adding the data structure to a data structure representing detected text elements.

9. A method according to claim 8, wherein the edge mask is formed in step (d) by applying an edge detection algorithm, such as Canny edge detector, to the image data.

20 10. A method according to claim 8 or claim 9, further comprising performing an adaptive thresholding algorithm on the edge mask prior to step (e) such that the edge pixels in the edge mask have a first binary value, all other pixels having a second binary value.

11. A method according to claim 10, wherein the start position of a contiguous 25 horizontal edge is detected in step (e) by detecting the transition along a row between pixels having the second binary value to pixels having the first binary value and/or by detecting a pixel having the first binary value in the left-hand most position along the row.

12. A method according to claim 10 or claim 11, wherein the end position of a 30 contiguous horizontal edge is detected in step (e) by detecting the transition

29

along a row between pixels having the first binary value to pixels having the second binary value and/or by detecting a pixel having the first binary value in the right-hand most position along the row.

13. A method according to any of claims 8 to 12, wherein the confidence 5 values do not meet the predetermined set of confidence criteria if:

i) either the confidence value for the data structure is less than a first threshold and the if the confidence value for either portion of the data structure on either side of the largest gap exceeds the confidence value for the data structure; or ii) all of the confidence values are below a second threshold.

10 14. A method according to any of claims 8 to 13, further comprising forming an image mask from the data structure representing detected text elements by setting pixels in the image mask to a value depending on the confidence value for each data structure representing candidate text elements added to the data structure representing detected text elements.

15 15. A method according to claim 14, further comprising performing a thresholding algorithm on the image mask.

16. A method according to claim 15, further comprising performing a morphological dilation algorithm on the thresholded image mask.

17. A method according to any of claims 8 to 16, further comprising removing 20 or obscuring text elements in the image data by modifying pixels in the image data relating to detected text elements according to the data structure representing detected text elements.

18. A method according to claim 17, wherein the pixels are modified by an inpainting algorithm.

25 19. A method of sharing image data comprising images of a subject generated by a medical imaging device, the method comprising:

30

a) receiving a subject record comprising the image data and subject identification metadata, wherein the image data has been previously modified to remove any features identifying the subject;

b) storing the subject record;

c) receiving login details from a user;

d) authenticating the login details to confirm that the user is the subject identified by the subject identification metadata; and e) receiving a sharing command from the user, the sharing command indicating a selected one of a plurality of sharing services over which the image data is to be shared with a third party, and sharing the image data over the selected sharing service.

20. A method according to claim 19, further comprising validating the received subject record, prior to storing the subject record, by confirming that the subject record comprises all the image data specified in a manifest and/or the validity of the image data and/or the validity of a node identification number allocated to the medical imaging device for uniquely identifying the medical imaging device.

21. A method according to claim 19 or 20, wherein step (b) comprises storing a unique subject record identification number in a database record along with the subject identification data and one or more uniform resource locators indicating the location of the image data in a separate file system.

22. A method according to any of claims 19 to 21, further comprising transmitting the image data from the subject record to one or more e-mail addresses specified in the subject record.

23. A method according to any of claims 19 to 22, wherein the sharing services comprise one or more social media services and/or e-mail.

24. A method of sharing image data representing images of a subject generated by a medical imaging device, the method comprising a combination of the method of any of claims 1 to 18 followed by the method of any of claims 19 to 23.

31

25. A system comprising one or more capture devices adapted to perform the method of any of claims 1 to 18, each of which is coupled to a respective medical imaging device, in use, and a remote storage device adapted to perform the method of any of claims 19 to 23, the remote storage device and

5 the or each capture device together forming a network.

26. A method of detecting text elements in image data, the method comprising:

a) forming an edge mask comprising only edge pixels in the image data;

b) for each row of the edge mask, forming a data structure representing candidate text elements consisting of the start and end positions in the row of

10 contiguous horizontal edges; and c) for each data structure representing candidate text elements, calculating confidence values for the data structure and for each portion of the data structure on either side of the largest gap between adjacent contiguous horizontal edges; and either

15 i) replacing the data structure with two data structures, each consisting of one of the portions of the data structure on either side of the largest gap and then repeating step (c), if the confidence values do not meet a predetermined set of confidence criteria; or ii) adding the data structure to a data structure representing detected 20 text elements.

27. A method according to claim 26, wherein the edge mask is formed in step (a) by applying an edge detection algorithm, such as Canny edge detector, to the image data.

28. A method according to claim 26 or claim 27, further comprising performing 25 an adaptive thresholding algorithm on the edge mask prior to step (b) such that the edge pixels in the edge mask have a first binary value, all other pixels having a second binary value.

29. A method according to claim 28, wherein the start position of a contiguous horizontal edge is detected in step (b) by detecting the transition along a row

30 between pixels having the second binary value to pixels having the first binary

32

value and/or by detecting a pixel having the first binary value in the left-hand most position along the row.

30. A method according to claim 28 or claim 29, wherein the end position of a contiguous horizontal edge is detected in step (b) by detecting the transition

5 along a row between pixels having the first binary value to pixels having the second binary value and/or by detecting a pixel having the first binary value in the right-hand most position along the row.

31. A method according to any of claims 26 to 30, wherein the confidence values do not meet the predetermined set of confidence criteria if:

10 i) either the confidence value for the data structure is less than a first threshold and the if the confidence value for either portion of the data structure on either side of the largest gap exceeds the confidence value for the data structure; or ii) all of the confidence values are below a second threshold.

32. A method according to any of claims 26 to 31, further comprising forming 15 an image mask from the data structure representing detected text elements by setting pixels in the image mask to a value depending on the confidence value for each data structure representing candidate text elements added to the data structure representing detected text elements.

33. A method according to claim 32, further comprising performing a 20 thresholding algorithm on the image mask.

34. A method according to claim 33, further comprising performing a morphological dilation algorithm on the thresholded image mask.

35. A method according to any of claims 26 to 32, further comprising removing or obscuring text elements in the image data by modifying pixels in the image

25 data relating to detected text elements according to the data structure representing detected text elements.

36. A method according to claim 35, wherein the pixels are modified by an inpainting algorithm.

33

37. A method substantially as hereinbefore described with reference to the accompanying drawings.

38. A system substantially as hereinbefore described with reference to the accompanying drawings.