Metadata Revisited

Metadata Revisited

Veit Hailperin
by Veit Hailperin
time to read: 7 minutes

Metadata describes data. It has popped up in the news particularly in respect to the discussion of surveillance in general and of phones. Metadata describes in this case, when a call was place, who called whom, the duration of the call et cetera. A lowbrow politician might call this information not sensitive, because the actual phone call is not recorded. See for yourself, that this is nonsense:

There is more data in metadata than you would expect.

Metadata are everywhere. Current file systems, so-called Journaling File Systems save data, e.g. the whereabouts on the disc and size of stored data. Images contain metadata about the size and date of creation. If a photo is taken from a mobile phone, it is likely to contain the location where it was taken. Documents often include author and name of the document inside metadata.

Blessing in Disguise

Metadata enables new functionality. A picture gallery can display photos on a map, using the geocoding. Looking for photos from the visit in Asia becomes very easy. A published article contains name and author anyway, but providing the data also in the metadata will enable a library to classify, sort and search documents with greater convenience. Metadata by itself is not bad.

Know the Ropes

Metadata turns problematic if it contains information that wasn’t meant to be shared. This goes from bad to worse when people don’t even know what is contained inside the images, photos and documents they publish. It is not a new problem, but rather one that vanished into oblivion in light of other, attention-grabbing news.

Security consultants as well as evil hackers look to use everything in ways that it isn’t meant to be used. This is not always, but most often, vicious. Many companies put PDF files on their websites that contain a wide variety of content, e.g. software installation howtos, mobile phone application forms, quarterly figures or flight tickets. These documents don’t need an author in its metadata, but they often do anyway. An attacker receives the name of an employee this way, which he can then later abuse in a social engineering attack. Sometimes the internal username is used as author, which can be misused to deduce more information. Also the product with which the document was created is interesting for an attacker. Detailed information might help him to create a backdoored PDF which could be sent as job application. These are merely examples for potential sensitive data and a few ways of malpractice. The range of possible fraudulent use is wide.

Best Thing Since Sliced Bread

One of the reasons why PDF metadata has vanished in many security audit reports is the additional work, that is required to identify all metadata of all PDF files on a website. To ensure that this excuse will go extinct soon, I wrote an extension called PDF Metadata for the de facto standard web application audit tool BurpSuite. This extension reads metadata (document information und XMP metadata) comfortably and displays the found data clearly arranged.

Screenshot of the Extension

There is only one option, starting with version 0.5 of the extension, a user can chose. The choice is between speed and thoroughness. Reading all answers from all requests consumes plenty of time and ressources. Since the vast majority of all responses do not include PDF files, one can opt for Scan Fast. It will then only analyze requests containing the string .pdf. If the web application has requests that generate PDF files rather than just link them, I recommend picking the option Scan Thoroughly. This will analyze all responses and search for PDF files. I recommend you do this at the end of your days testing, because it will slow down your Burp noticeably.

Chose your setting

This extension is licensed under GNU Public License and is available for free. For those that like their life to be easy, simply install it using the official BApp Store.

Full Monty: Strip Metadata

How can we guarantee the prevention of involuntary leakage of information through metadata? Sensitive information should always be stripped from documents. Which data gets classified as sensitive depends on organization and in some cases from the document. A simple solution is to enforce the use of a PDF sanitizer on each document that is shared with others. This can be implemented internally using policies.

Employees should be informed about the dangers of metadata and the possibilities to sanitize the documents. This should prevent them from sending out documents in e-mail attachments that still contain metadata with sensitive information. Some office programs offer possibilities to test for metadata or even remove some of them. All of them are easy to use. But, alas, every rose has its thorn. Most tools won’t remove the information about Producer and Creator.

Another possibility to remove metadata is to use a plugin in the content management system of a website, that strips the metadata from all uploads. This way a document will be free of metadata on publication.

For those that like the humorous approach, I recommend Der Metadatensauger

About the Author

Veit Hailperin

Veit Hailperin has been working in information security since 2010. His research focuses on network and application layer security and the protection of privacy. He presents his findings at conferences.

Links

Is your data also traded on the dark net?

We are going to monitor the digital underground for you!

×
Active Directory certificate services

Active Directory certificate services

Eric Maurer

Specific Criticism of CVSS4

Specific Criticism of CVSS4

Marc Ruef

The new NIST Cybersecurity Framework

The new NIST Cybersecurity Framework

Tomaso Vasella

Ways of attacking Generative AI

Ways of attacking Generative AI

Andrea Hauser

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here