Get Hindustan Times/Dainik Full ePaper as PDF!!
I have been outside India for quite sometime now, but still I always love to read Indian News! There is nothing like it! Even more enjoyable is reading the epaper.. Times Of India used to provide it for free but now you have to pay to get the full paper. So I decided to switch to Hindustan Times. The epaper site of HT is good, but quite buggy at times. Some times it just stops responding or even worse spits out an SQL error! Luckily it provides a pdf version, but you have to go through their painful and extremely buggy registration process! Even after that you just get to download the pdf of a single page at a time! LAME!!!!
After fiddling a little with the site, I figured out tht the pdfs actually are jst static objects, located on their server and can be retrieved via a simple get request. Also, they follow a similar url construction pattern. This was enough information! Got down to work last night, eager to utilize my Java skills and the recently mastered Java Concurrent API !! The result? A GUI program which retrieves all the pdfs and merges them into a single pdf, using PDFBox 😀 .. It doesnot end there. This program currently supports two languages (Hindi and English) and 4 cities per language! Plus you can get archived papers also.
Eager to try?
The Java Web Start version is available here. This will download the jar and run it. The jar is self-signed by me :).
The conventional jar is available here. It is an executable jar. Just double click to run it. You will require Java Runtime 1.5+ installed on your computer.
Known Issues: You have to exit the program to start using the ePaper. This is a problem with PDFBox. Hopefully they should fix it in the next version.
Let me know if you run into any problems.
Edit: The paper is about 20 mb in size! So for users with slow internet connection, it will take quite some time! Have patience!