I was playing around with a way to get a number of files from a webserver in Python automatically (ala wget), and came up with the following script. It parses the HTML returned by a web page, looks for all a href tags for files ending with a given extension, then returns these filenames as a list. The rest script then steps through this list, getting each file in turn into a specified target folder.
#!/usr/bin/python
fromHTMLParserimportHTMLParserimporthttplibimportre#Config options
target="/tmp"#DNS name, not a url
webserver="www.example.com"webpath="/path/to/files"ext=".txt"classAnchorParser(HTMLParser):def__init__(self):HTMLParser.__init__(self)self.items=[]defhandle_starttag(self,tag,attrs):iftag=='a':forkey,valueinattrs:ifkey=='href'andre.search(ext+'$',value):self.items.append(value)defget_items(self):returnself.items#Setup an HTMLParser object
parser=AnchorParser()#Get the HTML for the web directory
web=httplib.HTTPConnection(webserver)web.request('GET',webpath)data=web.getresponse()#Pass this HTML to the Parser
parser.feed(data.read())#Get the returned list of filenames
filelist=parser.get_items()#Get each file in turn
foriteminfilelist:print"Getting file:"+itemweb.request('GET',webpath+'/'+item)resp=web.getresponse()#Write out the received data to a file in 'target'
withopen(target+'/'+item,'w')asf:f.write(resp.read())