- #import pycurl
- import pycurl
- import StringIO
- # lets create a pycurl object
- c = pycurl.Curl()
- # lets specify the details of FTP server
- c.setopt(pycurl.URL, r'ftp://ftp.ncbi.nih.gov/refseq/release/')
- # lets create a buffer in which we will write the output
- output = StringIO.StringIO()
- # lets assign this buffer to pycurl object
- c.setopt(pycurl.WRITEFUNCTION, output.write)
- # lets perform the LIST operation
- c.perform()
- # lets get the output in a string
- result = output.getvalue()
- # lets print the string on screen
- print result
- # FTP LIST output is separated by \r\n
- # lets split the output in lines
- lines = result.split('\r\n')
- # lets print the number of lines
- print len(lines)
- # lets walk through each line
- for line in lines:
- # lets print each part separately
- parts = line.split()
- # we can print the parts now
- print parts
- # the individual fields in this list of parts
- if not parts: continue
- permissions = parts[0]
- group = parts[2]
- user = parts[3]
- size = parts[4]
- month = parts[5]
- day = parts[6]
- yearortime = parts[7]
- name = parts[8]
The above program
- Creates a pycurl object
- Specifies the URL of an FTP server (anonymous account)
- Creates a StringIO buffer to store the results of FTP LIST command
- Associates the pycurl object with the StringIO buffer for writing output received from FTP server
- Performs the curl operation
- Extracts the output
- Breaks the output in lines (considering \r\n as separator)
- Walks through the lines one by one
- Splits the line based on whitespace into different parts
- Extracts different fields from the directory listing (permissions, group, user, size, filename etc.)
Notes about processing the output of FTP LIST command
The response of FTP LIST command is very much non-standard. Different flavors of FTP servers simply display the directory listing differently. So the parsing of this output may be easy for one FTP server but a code for parsing directory listings which works across all kinds of FTP servers is difficult to write. This is probably the reason why this functionality is not provided in ftplib (Python Standard Library). In
the FTP standard, the output of FTP LIST command was intended for human consumption rather than computer interpretation which led to all the variations over the years.
FTPPARSE http://cr.yp.to/ftpparse.html is a library for parsing FTP LIST command responses for a variety of FTP servers. ftpparse currently understands the LIST output from any UNIX server, Microsoft FTP Service, Windows NT FTP Server, VMS, WFTPD, NetPresenz, NetWare, and MSDOS. Its easy to write a Python wrapper for this library using ctypes.
Even this library doesn't work for a number of situations:
- When the size of a file is bigger than 2 GB.
- FTP servers of various video servers (I have seen GUI FTP clients like FileZilla or Windows explorer suck on some of them)
No comments:
Post a Comment