Programming tidbits: June 2010

So most of us programming people know about Endianness. We know that Intel microprocessors are little-endian while Motorola microprocessors are big-endian. Most of the Internet protocols as well as most of audio/video binary formats also follow big-endian style. Yet I wonder if many of us do know about the origin (etymology) of the these terms. I for myself didn't know about this before today. And I found it to be quite funny.

According to Wikipedia The term big-endian originally comes from Jonathan Swift's satirical novel Gulliver’s Travels by way of Danny Cohen in 1980^[2]. In 1726, Swift described tensions in Lilliput and Blefuscu: whereas royal edict in Lilliput requires cracking open one's soft-boiled egg at the small end, inhabitants of the rival kingdom of Blefuscu crack theirs at the big end (giving them the moniker Big-endians).^[6] The terms little-endian and endianness have a similar intent.^[7]

"On Holy Wars and a Plea for Peace"^[2] by Danny Cohen ends with: "Swift's point is that the difference between breaking the egg at the little-end and breaking it at the big-end is trivial. Therefore, he suggests, that everyone does it in his own preferred way. We agree that the difference between sending eggs with the little- or the big-end first is trivial, but we insist that everyone must do it in the same way, to avoid anarchy. Since the difference is trivial we may choose either way, but a decision must be made."

I first read about this story in Memory Management Programming Guide for Core Foundation. Hope you find this story interesting.

 #import pycurl
import pycurl
import StringIO
# lets create a pycurl object
c = pycurl.Curl()
# lets specify the details of FTP server
c.setopt(pycurl.URL, r'ftp://ftp.ncbi.nih.gov/refseq/release/')
# lets create a buffer in which we will write the output
output = StringIO.StringIO()
# lets assign this buffer to pycurl object
c.setopt(pycurl.WRITEFUNCTION, output.write)
# lets perform the LIST operation
c.perform()
# lets get the output in a string
result = output.getvalue()
# lets print the string on screen
print result
# FTP LIST output is separated by \r\n
# lets split the output in lines
lines = result.split('\r\n')
# lets print the number of lines
print len(lines)
# lets walk through each line
for line in lines:
    # lets print each part separately
    parts = line.split()
    # we can print the parts now
    print parts
    # the individual fields in this list of parts
    if not parts: continue
    permissions = parts[0]
    group = parts[2]
    user = parts[3]
    size = parts[4]
    month = parts[5]
    day = parts[6]
    yearortime = parts[7]
    name = parts[8]
 

The above program

Creates a pycurl object
Specifies the URL of an FTP server (anonymous account)
Creates a StringIO buffer to store the results of FTP LIST command
Associates the pycurl object with the StringIO buffer for writing output received from FTP server
Performs the curl operation
Extracts the output
Breaks the output in lines (considering \r\n as separator)
Walks through the lines one by one
Splits the line based on whitespace into different parts
Extracts different fields from the directory listing (permissions, group, user, size, filename etc.)

Notes about processing the output of FTP LIST command

The response of FTP LIST command is very much non-standard. Different flavors of FTP servers simply display the directory listing differently. So the parsing of this output may be easy for one FTP server but a code for parsing directory listings which works across all kinds of FTP servers is difficult to write. This is probably the reason why this functionality is not provided in ftplib (Python Standard Library). In
the FTP standard, the output of FTP LIST command was intended for human consumption rather than computer interpretation which led to all the variations over the years.

FTPPARSE http://cr.yp.to/ftpparse.html is a library for parsing FTP LIST command responses for a variety of FTP servers. ftpparse currently understands the LIST output from any UNIX server, Microsoft FTP Service, Windows NT FTP Server, VMS, WFTPD, NetPresenz, NetWare, and MSDOS. Its easy to write a Python wrapper for this library using ctypes.

Even this library doesn't work for a number of situations:
- When the size of a file is bigger than 2 GB.
- FTP servers of various video servers (I have seen GUI FTP clients like FileZilla or Windows explorer suck on some of them)

Programming tidbits

Monday, June 14, 2010

The origin of the term Endianness

Saturday, June 12, 2010

A simple program for FTP directory listing using pycurl

Notes about processing the output of FTP LIST command

Followers

Labels

Search This Blog

Blog Archive