Tuesday, December 15, 2009
Dojo Toolbox - SitePen, Inc.
Monday, December 14, 2009
Google Chrome Extensions: Blog This! (by Google)
1000 points on Stack Overflow
This was a very interesting experience.
Tuesday, October 20, 2009
A simple lexical analyzer demo in Haskell
Here is the output of this effort:
{-
This script implements a simple lexical analysis program
based on the example in Modern Compiler Design Book Section 2.1
Tokens
--------
The lexical analyzer supports following tokens.
Operators: +, -, * and /
Separators: ; , {} ( and )
Integers (a string of digits)
Identifiers: first character must be a letter
other characters can be letter or digit or underscore
two underscores consecutively are not allowed
last character cannot be an underscore
EOF token : represents end of file
Also supported are:
Comments: They start with # and end with #
Layout characters: all characters less than equal to ' ' are
considered as layout characters
Limitations
----------------
Some specific limitations of this implemenation are:
- Its a limited form of recursive descent parser. Based on
the first character in the token, we assume the token type
and continue to read the rest of token.
- It doesn't track the source code position. This would actually
require some form of monad since the source code position tracking
is a sort of side effect to the lexical analysis.
- The lexical analyzer doesn't support too many debugging facilities.
- Lack of unit tests at this point of time.
-}
{- following data type defines the tokens supported by this lexer
the C version in the MCD book uses #defines for different token
classes. Haskell version is much more cleaner with proper data
constructors attached to each token class and automatically
generated show and eq functions.
-}
import System.IO
-- a token can be
data Token = EOF | -- end of file
Identifier String | -- an identifier string
IntegerToken Integer | -- an integer
Operator Char | -- an operator
Separator Char | -- a separator
Error String -- or some erroneous token
deriving (Show, Eq)
-- lexer reads a string and returns a list of tokens
lexer :: String->[Token]
{- we need to remove comments and other layout characters
before reading a token. This is an example of conducting
some preprocessing before doing an actual operation.
Such a situation should be implemented using two sister
functions (here lexer and lexer').
lexer does the job of removing layout and comments
lexer' does the job of reading a token
lexer calls lexer' (for remaining data)
lexer' calls back lexer (for remaining data)
This kind of strucutre eliminates complicated if-then-else logic.
-}
lexer xs = let s = skipLayoutAndComment xs
in lexer' s
-- for an empty string, lex returns an empty list
lexer' [] = [EOF]
-- for a non-empty string we read the first token
-- then we let lexer tokenize the rest of string
lexer' xs = let (t, rest)= nextToken xs
in (t:lexer rest)
-- next token reads the input string and returns the next token and
-- rest of string
nextToken:: String->(Token, String)
nextToken [] = (EOF, [])
{- The implementation is a five way branch based on the first character.
If the first character is a letter, we read an identifier.
If the first character is a digit, we read a number.
If the first character is an operator, we simply read an operator token.
If the first character is a separator, we simply read a separator token.
Otherwise, we consider this as an erroneous character and generate
an error token for this character.
The lexical analyzer shouldn't stop analyzing the file at any point
of time. If it doesn't recognize any character or a string of characters
in the file, then it should simply generate an Error token at that point
of time and proceed further with reading next tokens.
It should be the job of parser to identify real syntax errors in the
code and report the user accordingly.
-}
nextToken l@(x:xs) | isLetter x = let (first, rest) = readIdentfier l
in (Identifier first, rest)
| isNumDigit x = let (first, rest) = readInteger l
in (IntegerToken first, rest)
| isOperator x = (Operator x, xs)
| isSeparator x = (Separator x, xs)
| otherwise = (Error [x], xs)
-- reads an integer and returns the token plus rest of string
readInteger xs = let (first, rest) = span isNumDigit xs in
(read first :: Integer, rest)
{-
An identifier consists of firstCharacter and restCharacter*
firstCharacter can be a letter
restCharacter can be letter or digit or underscore
more than one underscores consecutively are not allowed
this should be called only if first character of the input
list is a letter.
Here we assume that the first character is a letter. This
is actually ensured by the nextToken function defined above.
We thus focus on reading rest of identfier.
-}
readIdentfier [] = ([], [])
readIdentfier l@(x:xs) = let (first, rest) = readLettersOrDigits xs
(first', rest') = underscoreTails rest
identifier = (x:first) ++ first'
in (identifier, rest')
readLettersOrDigits :: String->(String, String)
{-
Reads an input string and splits it in two parts. The
first part (possibly empty) consists of a sequence of letter
or digit characters.
The 2nd part contains the rest of input string.
-}
readLettersOrDigits xs = span isLetterOrDigit xs
underscoreTail :: String->(String, String)
{-
Reads an undscore tail part of an identifier
Following rules must be observed:
- the tail starts with an underscore
- the underscore must be followed by a letter or a digit
- following this, there may be 0 or more letters or digits
-}
-- empty list would result in empty underscore tail
underscoreTail [] = ([], [])
-- at least two characters are required in undescore tail
underscoreTail [x] = ([], [x])
-- the first character must be an underscore
-- 2nd character must be a letter or digit
-- following characters (if any) must be letter or digit
underscoreTail l@(x:y:xs) = if isUnderScore x && isLetterOrDigit y
-- lets read the rest of letters/digits of this
-- tail
then let (first,rest) = readLettersOrDigits xs
in (x:y:first, rest)
-- there is no underscoreTail here
else ("", l)
underscoreTails :: String->(String, String)
-- reads 0 or more occurances of underscoreTail
underscoreTails xs = let -- lets try to read the first underscoreTail
(first, rest) = underscoreTail xs in
if null first
-- if first tail is absent, then we just return
then (first, rest)
-- otherwise we read rest of the tail recursively
else let (first', rest') = underscoreTails rest
-- we join the first tail and rest of the tail
-- we return this along with unread string
in (first ++ first' , rest')
-- skip layout and comment
skipLayoutAndComment = skipLayout . skipComment
-- skips layout
skipLayout [] = []
skipLayout l@(x:xs) = case isLayout x of
True -> skipLayout xs
False -> l
skipComment :: String->String
{- skips comment
The first character must be #.
Following this, we need to keep skipping till we find
another #.
-}
skipComment [] = []
skipComment l@(x:xs) = case isCommentStart x of
True -> -- we need to find the end of comment
skipTillCommentEnd xs
False -> l
-- skips till the end of comment character is found
-- this should be called from skip comment function
-- this routine is perfect for recursive tail optimization
skipTillCommentEnd [] = []
skipTillCommentEnd l@(x:xs) = case isCommentEnd x of
True-> xs
False-> skipTillCommentEnd xs
-- character level tests
-- a layout character
isLayout ch = ch <= ' '
-- comment start character
isCommentStart ch = ch == '#'
-- comment end character
isCommentEnd ch = ch == '#'
-- upper case letter
isUpperCase ch = 'A' <= ch && ch <= 'Z'
-- lower case letter
isLowerCase ch = 'a' <= ch && ch <= 'z'
-- a letter
isLetter ch = isLowerCase ch || isUpperCase ch
-- a digit
isNumDigit ch = '0' <= ch && ch <= '9'
-- a letter or a digit
isLetterOrDigit ch = isLetter ch || isNumDigit ch
-- an underscore
isUnderScore ch = ch == '_'
-- an operator
isOperator ch = ch `elem` "+-*/"
-- a separator
isSeparator ch = ch `elem` ";,(){}"
main :: IO ()
main = do
putStrLn "Simple Lexical Analyzer v0.1"
mainloop
mainloop :: IO ()
mainloop = do
putStr ">>> "
hFlush stdout
inputStr <- getLine
case inputStr of
"q" -> return ()
otherwise ->
do
let tokens = lexer inputStr
putStrLn $ show tokens
mainloop
A typical session for this program looks like follows:
runghc MCDLex.hs
Simple Lexical Analyzer v0.1
>>> abc
[Identifier "abc",EOF]
>>> 2344
[IntegerToken 2344,EOF]
>>> +
[Operator '+',EOF]
>>> +-/*
[Operator '+',Operator '-',Operator '/',Operator '*',EOF]
>>> (){},;
[Separator '(',Separator ')',Separator '{',Separator '}',Separator ',',Separator ';',EOF]
>>> #thiscommentwillbeignored" todo_list
[EOF]
>>> #thiscommentwillbeignored# todo_list
[Identifier "todo_list",EOF]
>>> (3+4)
[Separator '(',IntegerToken 3,Operator '+',IntegerToken 4,Separator ')',EOF]
>>> errors__
[Identifier "errors",Error "_",Error "_",EOF]
>>> 23___45
[IntegerToken 23,Error "_",Error "_",Error "_",IntegerToken 45,EOF]
>>>
Note: I tried to use the "prettyprint" javascript support to beutify this code. But it seems like the pretty printer is not able to parse haskell properly. Hence I am removing it and showing the code in plain pre tag.
Sunday, October 18, 2009
AST annotations
Thursday, October 15, 2009
Hope Lexical analyzer is working
- Parsec tutorial and reference guide
- Real world Haskell (online book)
- Develop your own scheme interpreter in 24 hours
Thursday, October 1, 2009
PyCon India 2009 My impressions
The conference was organized at IISc Bangalore. From the main gate to reach the conference area was easy as proper directions had been placed all over. Registration was a bit chaotic. There were not enough desks to handle the huge crowd waiting to be registered. I guess it was still quite satisfactory process considering the fact that it was first such event in India and there were not enough funds (I guess) to manage the event professionally. Since I had pre-registered, I had to pay just 200 rupees for attendance. They provided us with schedule, a T shirt, a notebook, a pen and off course a badge which was quite good as a conference kit. And the lunch and tea breaks were excellent.
There were about 600+ delegates participating in the conference. The profile varied between students, professors, independent consultants, professional programmers etc.
The first session by Prabhu Ramchandran (My adventures) was quite nice. It was interesting to see how an open source project Mayavi had been run successfully over the years out of India. Since there were three parallel tracks going on, hence I had to identify the sessions I wished to attend.
Some of the sessions had a lot of issues related to presentation. The presentation on zc.buildout was quite boring. The presenter spent too much time just discussing what is isolation and repeatability that there wasn't much time left for actual demonstration. Similarly, the session on DJango turned out to be quite boring. There were three presenters taking turns in the talk. They spent a lot of time preaching about DJango, like what it is, why it is great, etc. etc. They also spend a lot of time typing the code which they wished to demonstrate. I guess one should be very careful in using the 45 minutes available during a conference time and make the best use of it as possible. One should avoid time wasters like typing code and fixing bugs during the talk itself. Another boring session was on Semantic web. The presenter was explaining a whole lot of difficult concepts. It seemed like a typical IIT lecture in which students preferred to sleep through or if possible bunk them.
Another interesting talk was 'Jython at Strand LS' by Janaki where the speaker explained how they used Jython at their company. Examples were short and sweet. Illustrations were easy to understand and the speaker had a great sense of humour. I was impressed to see the extensive application of Jython in a production quality platform like AVADIS (TM) at Strand.
Finally, my own talk on using ZPT and RML for generating PDF in Python applications came. I didn't expect too many to attend the talk but I was happy to see about 30+ ppl still continuing to sit for my session. Some of them left during the session which meant that they didn't really enjoy the content much. Others continued through the session and I guess they liked what I talked. I would leave it for others to comment on how good the session was.
Some of the things I feel that should be improved for future conferences:
- Expected audience of a talk should be more clearly explained
- Focus should be more on specific coding workshops rather than general talks
- Longer workshops on specific topics should be conducted if there are speakers available
- There was no peer review of presentation material. This should be extensively fixed.
- The work on presentation materials should actually finish 2-3 months before the conference at least so that peer reviews and revisions can be completed say about 2 weeks before actual conference.
- Scheduling of talks should happen through a more transparent process
- Registered delegates should receive notifications about talks (when its announced, when presentation materials are uploaded etc.) over mail as the process keeps on going before the actual conference
- The talks without presentation material should be rejected in the final schedule.
- Speakers should avoid wasting too much time talking about abstract concepts and spend more on illustrations. They should also avoid typing code during the session.
- Multi-speaker talks should be minimized
Wednesday, September 30, 2009
Hope Project at Kenai
The site is hosted at http://kenai.com/projects/hope. I will continue to blog about this from this blog.
Wednesday, August 19, 2009
Quicksort in Haskell
quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort (x:xs) = smaller ++ [x] ++ bigger
where smaller = quicksort [y| y <- xs , y < x]
bigger = quicksort [y | y <- xs, y >= x]
So how does this work?
- [a] ->[a] means that quicksort takes a list as input and returns a list as output
- (Ord a) means that a belongs to a type class Ord (i.e. a is an ordered type)
- The first expression means that quicksort returns an empty list for an input empty list
- The second expression works as follows. The input list is broken into its head and tail (x, xs). We use x as the pivot element. all elements smaller than x in xs are identified and sorted using quicksort itself (recursively). All elements greater than or equal to x are sorted by quicksort itself (recursively). The final result is smaller + [x] + bigger.
So here are some examples:
*Main> quicksort [2, 1, 3]
[1,2,3]
*Main> quicksort "i don't believe you"
" 'bdeeeiilnootuvy"
*Main> quicksort "i love you"
" eiloouvy"
*Main> quicksort [50, 40, 30 , 10, 20, 30, 40, 50, 20]
[10,20,20,30,30,40,40,50,50]
*Main>
Saturday, February 21, 2009
testing code prettifier
So trying it out now.
print "Hello World"
for i in range(10):
print i,
Well this works nicely.
For more information about how to set it up, please follow it at:
http://lukabloga.blogspot.com/2008/10/to-test-new-highlighting.html
Tuesday, January 6, 2009
user interfaces and criticism
Be warned, this is not serious stuff !!
I happen to do user interface development most of the time. It so happens that I am never able to do something which satisfies everybody. At best I am able to do an OK job. I did get to learn some interesting things over the years.
No matter how much effort one puts in user interface development, its never enough.
Its much much easier to find out issues in user interfaces than to implement solutions which take care of them properly. So just relax.
Its much more comfortable to be in the role of a critic than in the role of implementer. So try hard to get into such a role.
If you don't like the criticism you get, just sit back and relax. Don't reply back. Just ignore it and chill.
You will not like the user interface you have done if you look back at it after a few weeks or few months time. Don't try to improve it!! Its a never ending vicious circle.