Script for Downloading Images and Links From a Web Page
- October 1st, 2010
- Posted in CrunchBang Linux . Debian . Linux . NCLUG . Open Source / FOSS . Ubuntu, Kubuntu, Xubuntu, Fluxbuntu, Nubuntu, Geubuntu . Web
- Write comment
There are occasions when an individual might wish to download any or all of the images that may be linked from a web page, such as when there is a thumbnail image that is linked to a larger version of the same image (view an example of one such page). Perhaps too, an individual might wish to obtain a list of all hyperlinks that are referenced in a web page. After running across Guillermo Garron’s article where he provides some creative commands that will allow you to perform the two tasks listed above, I decided that it would be fun to write a script that executes all of this for you. My Bash script is called “imageDownloader“, although in addition to downloading images, it will also create a text file containing all of the hyperlinks that are referenced from an html page. Please note that the images that are downloaded are not the actual images that are displayed on the web page, but are the images that the page links to.
Upon executing the script, the user is welcomed with a short message that explains what the program does, and gives the user a series of choices:
This program will allow you to do one of the following: (1) List all hyperlinks referenced in a web page and store the list in a text file (2) Download all images that are hyperlinked from a web page, such as when you would click on a thumbnail image in order to view a larger version of the same image. ************************************************************************************************* This script relies on the program called "lynx", so if you don't already have it installed, you may want to quit (q) now and install "lynx". ************************************************************************************************* What would you like to do? Enter "1" to download a list of hyperlinks, "2" to download images that this page links to, or "q" to QUIT:
So, as requested, enter the appropriate choice that most suits your needs, and make sure that you already have “lynx” installed. Entering either option 1 or 2 will prompt you to enter the desired URL. It is helpful if you are using a terminal emulator that allows for copy/paste editing; my personal favorite is Terminator, which incidentally allows you to split your terminal screen into multiple panes. You will then be asked to enter a directory name where you wish to either save your text file containing a list of hyperlinks or the location for your images that will be downloaded and saved, and then it begins working its magic. You’ll have the option to start over or quit the program at the end.
Note: This was a fun learning opportunity for me and although the concepts used here are not overly difficult, it was still a fun learning experience. For those who are more experienced coders, if you see that there are places where I could improve my coding practices, please feel free to send me your suggestions and upgrades for this little program.
You can download or view imageDownloader script here, or follow the process outlined below. You might save it without the “.txt” file extension if you like, as I added this to make it viewable from the comfort of your web browser. Remember to make the file executable before running it.
$ wget http://www.hilltopyodeler.com/scripts/imageDownloader.txt $ mv imageDownloader.txt imageDownloader $ chmod 777 imageDownloader $ ./imageDownloader
When prompted to enter a URL, you might like to try using the example page that I used above for downloading images (copy/paste): http://ubuntustudio.org/screenshots
Happy downloading!
Hey,
This script works pretty cool. I work on OS X and found that wget does not work with it. So I replaced wget with curl -O and it indeed worked.
Also when I try using this script to download all pictures from some facebook album(using the public link) it does not work.
Could you give me some heads up on this.
Srayan.
@srayan, Facebook requires that you log in to view any content. When you log in, a cookie is written to your web browser that verifies that you have authenticated with their system. Your terminal emulator is not going to authenticate with Facebook and does not have a way to store their cookie, so there really is not a way that I am aware of to use a script like this to download content from a website that requires authentication like Facebook does. Also, the script is designed to look for href tags in the source code and to grab the files that are being linked to (so long as the file types have been defined in the script, such as “.jpg” and “.JPG”, “.mov”, and “.MOV”, etc. Instead of using href’s, Facebook uses a lot of Javascript to access images, so the script doesn’t know how to handle that. Sorry.
wideroee.wordpress.com
Does Not Work With 64 bit system. Any Solution????
@abcde
I just built up a 64 bit Debian (Squeeze) system to test this out and did not encounter any issues. Did you make sure to install lynx? Lynx is required in order for the program to work.
Oh nice, this works really good. Thank you
Thanks for the script, I made a video on it and showed people how it works. I added your links to your site and the script. Works great!!!
Here’s the link for the video:
http://www.youtube.com/watch?v=8Wvtx0Wi3UI&feature=youtu.be
Thank you!