yesterday i was just curious about lynx and google search engine,so we can use bash for get quick results an automate the process,also i want to filter the url,using sed or awk .
the first thing is stablish the url for a proper search for this example i wanna to use this
http://www.google.com/search?q=keywordforsearchhere&start=pagenumberherewhere the search?q= interpret the proper keyword. and &start= is the number of page,as a text browser i use lynx followed of -dump and -listonly options,lynx provide many command line options but for this test i just use the above -dump for formatted output of the default document and -listonly that show only the list of links.
for the first test i use keyword=house and page=1
lynx "http://www.google.com/search?q=house&start=1" -dump -listonly
it gives a result like in the pastie
http://pastie.org/private/jlaakeglj0fsfga27tmoqgthe final result :
lynx "http://www.google.com/search?q=house&start=1" -dump -listonly | grep 'url?q=' | cut -d ' ' -f4 | sed 's/http:\/\/www.google.com\/url?q=//' | sed 's/\(&sa=\).*//'finally :
#!/bin/bash #Google search using bash tools #we need $1 the keyword count=0 #page number while [ "$count" -le 200 ] do lynx "http://www.google.com/search?q=$1&start=$count" -dump -listonly | grep 'url?q=' | cut -d ' ' -f4 | sed 's/http:\/\/www.google.com\/url?q=//' | sed 's/\(&sa=\).*//' count=$(( $count +5 )) done echoCiao
0 comentarios:
Post a Comment