• Yogi

CLEANING A PDF AFTER SCANNING





After a file/document is scanned, the PDF thereof may not replicate the source physical file/document. 'Cleaning the PDF' is a short process to remedy this anomaly before further processing [page-labelling, bookmarking and hyperlinking].


At the outset, let me clarify on what I mean by "cleaning". The 'cleaning' I speak of in this blog is not about making your PDFs pretty. The cleaning process I advocate for is about ensuring the page order and page orientation comparatively between the physical file and the digital file by appropriately dealing with missing pages, extra pages, misplaced pages, wrongly orientated pages, etc.



WHY SHOULD YOU CLEAN


There are several reasons why a raw scanned PDF may not replicate the source physical document. Illustratively, the following circumstances may be noted:


  • The physical file is single-sided and a two-sided scan is performed. This will result in an extra blank page after each page in the PDF file.

  • Physical file is partly single-sided and partly double-sided and a two-sided scan is performed. In this case, a part of file will have extra blank pages after each page

  • The orientation of pages is wrong.

  • The scan was not properly performed resulting in jumbled page sequence.


Sometimes, even if the PDF exactly replicates the source file, such scan may not be suitable for digital functioning. For example, there might be extra blank pages or miscellaneous pages (such as half page dockets in some Courts) in the physical file which are not part of the file's running page numbering.


The above situations are only illustrative of the wide variety of anomalous circumstances that you may come across after scanning. A raw PDF cannot be processed for seamless page-labelling, bookmarking and hyperlinking without proper cleaning.



HOW TO CLEAN


All PDF editing software contain tools to perform various permutations and combinations of 'Page Organisation'.


Using the above tools, a scanned PDF can be cleaned very quickly.


Please note that there is no one stop combination to clean every PDF. Depending on the differences between the scanned file and its source file, you will have to select an appropriate combination in the page organisation tools.


The process is very simple. Once your clerk understands the rationale behind ‘cleaning the PDF’, he will instinctively know how to execute the cleaning in any circumstance.



TWO EXCEPTIONAL SITUATIONS


While cleaning the PDF, one might encounter the following 2 situations.


Missing pages: There might be pages missing in the source physical file itself.


Extra page numbering: Sometimes, if extra pages are added after numbering and photocopying, clerks resort to the short cut page numbering to avoid renumbering of whole file. Suppose, if 4 pages are added after page 35, the new pages are numbered as 35A, 35B, 35C and 35D.


In a physical file, missing pages or extra page numbering does not alter the remaining file's page order . However, in a scanned file, they affect the page sequence of all pages thereafter.


Both these situations can easily be remedied during page-labelling.


If you come across a document with ‘missing pages’ and/or ‘extra page numbering’ during cleaning process, jut make a note of the relevant pages for further process during page labelling.


IS CLEANING COMPULSORY ON ALL SCANNED DOCUMENTS


I have dealt with various kinds of documents that are encountered during digitisation and the possible combinations in performing CPBH.


Cleaning should be performed on all types of documents.


‘Cleaning the PDF’ is absolutely essential in case the source document/file has running page number. Without cleaning, the process of page labelling becomes cumbersome and prone to errors.



AIM AT THE END OF CLEANING


At the end of cleaning, the objective is to ensure that:


1) All the pages are in serial order of pagination as that of the source file


2) All pages are in the same orientation as the source file.


If you want your electronic case files to be OCR enabled or satisfy any other extra digital feature, you may perform the necessary tasks at this stage through your PDF editing software. These methods greatly enhance the usage experience of a PDF file. But, please note that these methods, in my opinion, are not compulsory.


Read the posts on Page-Labelling and Bookmarking & Hyperlinking.

I am a practicing advocate, with no professional technological expertise, operating a small stand-alone office. My digitization journey began in 2016. What started out as a simple pursuit to save my notes on judgments, eventually turned into a passion, changing the entire paradigm of my office. 

The information shared on this blog is a crystallised version of years of trial and error in various methods and work-flow implemented in my office. Please note that I am not promoting any hardware or software when I indicate them in this blog. My stress is only on the method. There are multiple alternatives of hardware and software for all tasks.

 

If you have any query, suggestion or wish to share your digitization methods, feel free to contact me at yogi@sanspaper.in

  • Facebook

©2020 by SansPaper. All Rights Reserved