Monday, June 14, 2010

The origin of the term Endianness

So most of us programming people know about Endianness. We know that Intel microprocessors are little-endian while Motorola microprocessors are big-endian. Most of the Internet protocols as well as most of audio/video binary formats also follow big-endian style. Yet I wonder if many of us do know about the origin (etymology) of the these terms. I for myself didn't know about this before today. And I found it to be quite funny.

According to Wikipedia The term big-endian originally comes from Jonathan Swift's satirical novel Gulliver’s Travels by way of Danny Cohen in 1980[2]. In 1726, Swift described tensions in Lilliput and Blefuscu: whereas royal edict in Lilliput requires cracking open one's soft-boiled egg at the small end, inhabitants of the rival kingdom of Blefuscu crack theirs at the big end (giving them the moniker Big-endians).[6] The terms little-endian and endianness have a similar intent.[7]

"On Holy Wars and a Plea for Peace"[2] by Danny Cohen ends with: "Swift's point is that the difference between breaking the egg at the little-end and breaking it at the big-end is trivial. Therefore, he suggests, that everyone does it in his own preferred way. We agree that the difference between sending eggs with the little- or the big-end first is trivial, but we insist that everyone must do it in the same way, to avoid anarchy. Since the difference is trivial we may choose either way, but a decision must be made."



I first read about this story in Memory Management Programming Guide for Core Foundation. Hope you find this story interesting.


Saturday, June 12, 2010

A simple program for FTP directory listing using pycurl

  1. #import pycurl
  2. import pycurl
  3. import StringIO
  4. # lets create a pycurl object
  5. c = pycurl.Curl()
  6. # lets specify the details of FTP server
  7. c.setopt(pycurl.URL, r'ftp://ftp.ncbi.nih.gov/refseq/release/')
  8. # lets create a buffer in which we will write the output
  9. output = StringIO.StringIO()
  10. # lets assign this buffer to pycurl object
  11. c.setopt(pycurl.WRITEFUNCTION, output.write)
  12. # lets perform the LIST operation
  13. c.perform()
  14. # lets get the output in a string
  15. result = output.getvalue()
  16. # lets print the string on screen
  17. print result
  18. # FTP LIST output is separated by \r\n
  19. # lets split the output in lines
  20. lines = result.split('\r\n')
  21. # lets print the number of lines
  22. print len(lines)
  23. # lets walk through each line
  24. for line in lines:
  25.     # lets print each part separately
  26.     parts = line.split()
  27.     # we can print the parts now
  28.     print parts
  29.     # the individual fields in this list of parts
  30.     if not parts: continue
  31.     permissions = parts[0]
  32.     group = parts[2]
  33.     user = parts[3]
  34.     size = parts[4]
  35.     month = parts[5]
  36.     day = parts[6]
  37.     yearortime = parts[7]
  38.     name = parts[8]

The above program

  • Creates a pycurl object
  • Specifies the URL of an FTP server (anonymous account)
  • Creates a StringIO buffer to store the results of FTP LIST command
  • Associates the pycurl object with the StringIO buffer for writing output received from FTP server
  • Performs the curl operation
  • Extracts the output
  • Breaks the output in lines (considering \r\n as separator)
  • Walks through the lines one by one
  • Splits the line based on whitespace into different parts
  • Extracts different fields from the directory listing (permissions, group, user, size, filename etc.)

Notes about processing the output of FTP LIST command

The response of FTP LIST command is very much non-standard. Different flavors of FTP servers simply display the directory listing differently. So the parsing of this output may be easy for one FTP server but a code for parsing directory listings which works across all kinds of FTP servers is difficult to write. This is probably the reason why this functionality is not provided in ftplib (Python Standard Library). In
the FTP standard, the output of FTP LIST command was intended for human consumption rather than computer interpretation which led to all the variations over the years.

FTPPARSE http://cr.yp.to/ftpparse.html is a library for parsing FTP LIST command responses for a variety of FTP servers. ftpparse currently understands the LIST output from any UNIX server, Microsoft FTP Service, Windows NT FTP Server, VMS, WFTPD, NetPresenz, NetWare, and MSDOS. Its easy to write a Python wrapper for this library using ctypes.

Even this library doesn't work for a number of situations:
- When the size of a file is bigger than 2 GB.
- FTP servers of various video servers (I have seen GUI FTP clients like FileZilla or Windows explorer suck on some of them)