top of page
  • Yogi


After a file/document is scanned, the PDF thereof may not replicate the source physical file/document. 'Cleaning the PDF' is a short process to remedy this anomaly before further processing [page-labelling, bookmarking and hyperlinking].

At the outset, let me clarify on what I mean by "cleaning". The 'cleaning' I speak of in this blog is not about making your PDFs pretty. The cleaning process I advocate for is about ensuring the page order and page orientation comparatively between the physical file and the digital file by appropriately dealing with missing pages, extra pages, misplaced pages, wrongly orientated pages, etc.


There are several reasons why a raw scanned PDF may not replicate the source physical document. Illustratively, the following circumstances may be noted:

  • The physical file is single-sided and a two-sided scan is performed. This will result in an extra blank page after each page in the PDF file.

  • Physical file is partly single-sided and partly double-sided and a two-sided scan is performed. In this case, a part of file will have extra blank pages after each page

  • The orientation of pages is wrong.

  • The scan was not properly performed resulting in jumbled page sequence.

Sometimes, even if the PDF exactly replicates the source file, such scan may not be suitable for digital functioning. For example, there might be extra blank pages or miscellaneous pages (such as half page dockets in some Courts) in the physical file which are not part of the file's running page numbering.

The above situations are only illustrative of the wide variety of anomalous circumstances that you may come across after scanning. A raw PDF cannot be processed for seamless page-labelling, bookmarking and hyperlinking without proper cleaning.


All PDF editing software contain tools to perform various permutations and combinations of 'Page Organisation'.

Using the above tools, a scanned PDF can be cleaned very quickly.

Please note that there is no one stop combination to clean every PDF. Depending on the differences between the scanned file and its source file, you will have to select an appropriate combination in the page organisation tools.

The process is very simple. Once your clerk understands the rationale behind ‘cleaning the PDF’, he will instinctively know how to execute the cleaning in any circumstance.


While cleaning the PDF, one might encounter the following 2 situations.

Missing pages: There might be pages missing in the source physical file itself.

Extra page numbering: Sometimes, if extra pages are added after numbering and photocopying, clerks resort to the short cut page numbering to avoid renumbering of whole file. Suppose, if 4 pages are added after page 35, the new pages are numbered as 35A, 35B, 35C and 35D.

In a physical file, missing pages or extra page numbering does not alter the remaining file's page order . However, in a scanned file, they affect the page sequence of all pages thereafter.

Both these situations can easily be remedied during page-labelling.

If you come across a document with ‘missing pages’ and/or ‘extra page numbering’ during cleaning process, jut make a note of the relevant pages for further process during page labelling.


Cleaning should be performed on all types of documents.

‘Cleaning the PDF’ is absolutely essential in case the source document/file has running page number. Without cleaning, the process of page labelling becomes cumbersome and prone to errors.


At the end of cleaning, the objective is to ensure that:

1) All the pages are in serial order of pagination as that of the source file

2) All pages are in the same orientation as the source file.

If you want your electronic case files to be OCR enabled or satisfy any other extra digital feature, you may perform the necessary tasks at this stage through your PDF editing software. These methods greatly enhance the usage experience of a PDF file. But, please note that these methods, in my opinion, are not compulsory.

bottom of page