Saturday, January 14, 2012

python script to download files via google search

Just now some one was asking to me on chat if there is some script which can download the files if given a google search query.

When thinking I suddenly remember that I had coded such a script some time ago using xgoogle libray of python. So I just searched for my script and here it is.

As you know I am lazy, I have used xgoogle and not directly handle google via httplib or urllib etc etc. My script used getopt library to parse the options given to the script. (again I am lazy)
(xgoogle library can be downloaded at http://www.catonmat.net/blog/python-library-for-google-search/)

The general syntax of this script is

python gsrchDwn.py --query "query_text" [--ftype file_extension] [--cnt contine_result_number] [--dir download_dir]

usage: python gsrchDwn.py --query maths made easy --ftype pdf

IMP Notes
1)It proper results are not got try the query in " (double quotes)
2) This file need xgoogle library found at http://www.catonmat.net/blog/python-library-for-google-search/

If --dir is not given it will download files into current directory.
If the script is stopped inbetween you can continue from the last result number by using --cnt result_number

This time I am have become a good boy and also added a status printing which shows how much percentage of current file is downlaoded.

grsrchDwn.py can be downloaded HERE

Let me know any comments.

10 comments:

Anonymous said...

Yet another Python 2.x code : (

I really like Python 3 but getting anything to work there is almost impossible. Many of the libraries are depricated and fixing all the compatibility issues seems impossible. I even fail to get the xgoogle to work.

I could really need this automatic download thing to collect research data :(

neo said...

Ya python 3.0 has compatibility issues. Lots of old libraries dont work with it. So I dont use 3.0 normally.

vaibhav said...

Thanks for sharing.

The only problem is that the files are coming in hidden format.

How do I change them to normal files.

Let me know

neo said...

Well I didnt face this issue on my side ? Could you please elaborate which OS you are using this code ?

When you say hidden means hidden from OS with a hidden flag am I right?

Anonymous said...

I fixed that issue. However, the script only downloads 180 files at once. How can I increase this number?.
also, what changes can be done in the script if someone wants to get back a certain number of files.
Eg. python gsrchDwn.py --ftype pdf --query Hello --num 10000

neo said...

Well counter needs to be implemented which can check parameter for number of files and break the loop after counter crosses that number.

neo said...

@Anonymous

Well Here you go I made a version 2 of the script adding new option you suggested.

http://infosec-neo.blogspot.in/2012/10/python-script-to-download-files-via.html

Michael said...

I think script doesn't match with the Xgoogle library.
Whenever I run the script, I see ***Unable to Download file:IOError ....

Do you know any workaround with this one?

neo said...

Google has changed the url style so there is problem with downloading the file. I am looking into it and would post newer version of file when found solution for the bug.

Max said...

Hi Neo,

Thanks for the great script. I was hoping to see that this script supports advanced Google search terms e.g.

--q "site:whatever.com keyword1+keyword2"

This would be great, can you make it happen buddy :-)

Best regards