Just now some one was asking to me on chat if there is some script which can download the files if given a google search query.
When thinking I suddenly remember that I had coded such a script some time ago using xgoogle libray of python. So I just searched for my script and here it is.
As you know I am lazy, I have used xgoogle and not directly handle google via httplib or urllib etc etc. My script used getopt library to parse the options given to the script. (again I am lazy)
(xgoogle library can be downloaded at http://www.catonmat.net/blog/python-library-for-google-search/)
The general syntax of this script is
python gsrchDwn.py --query "query_text" [--ftype file_extension] [--cnt contine_result_number] [--dir download_dir]
usage: python gsrchDwn.py --query maths made easy --ftype pdf
IMP Notes
1)It proper results are not got try the query in " (double quotes)
2) This file need xgoogle library found at http://www.catonmat.net/blog/python-library-for-google-search/
If --dir is not given it will download files into current directory.
If the script is stopped inbetween you can continue from the last result number by using --cnt result_number
This time I am have become a good boy and also added a status printing which shows how much percentage of current file is downlaoded.
grsrchDwn.py can be downloaded HERE
Let me know any comments.
Subscribe to:
Post Comments (Atom)
10 comments:
Yet another Python 2.x code : (
I really like Python 3 but getting anything to work there is almost impossible. Many of the libraries are depricated and fixing all the compatibility issues seems impossible. I even fail to get the xgoogle to work.
I could really need this automatic download thing to collect research data :(
Ya python 3.0 has compatibility issues. Lots of old libraries dont work with it. So I dont use 3.0 normally.
Thanks for sharing.
The only problem is that the files are coming in hidden format.
How do I change them to normal files.
Let me know
Well I didnt face this issue on my side ? Could you please elaborate which OS you are using this code ?
When you say hidden means hidden from OS with a hidden flag am I right?
I fixed that issue. However, the script only downloads 180 files at once. How can I increase this number?.
also, what changes can be done in the script if someone wants to get back a certain number of files.
Eg. python gsrchDwn.py --ftype pdf --query Hello --num 10000
Well counter needs to be implemented which can check parameter for number of files and break the loop after counter crosses that number.
@Anonymous
Well Here you go I made a version 2 of the script adding new option you suggested.
http://infosec-neo.blogspot.in/2012/10/python-script-to-download-files-via.html
I think script doesn't match with the Xgoogle library.
Whenever I run the script, I see ***Unable to Download file:IOError ....
Do you know any workaround with this one?
Google has changed the url style so there is problem with downloading the file. I am looking into it and would post newer version of file when found solution for the bug.
Hi Neo,
Thanks for the great script. I was hoping to see that this script supports advanced Google search terms e.g.
--q "site:whatever.com keyword1+keyword2"
This would be great, can you make it happen buddy :-)
Best regards
Post a Comment