For the chapter 5 JSON exercises I needed to update the prelude scheme library to add intercalate. I also wrote the following scheme files which are just ported versions of the haskell equivalents.
Data types – JValue
The main headache attacking these exercises was the data type declarations. I took a very simple approach using PLT’s define-structure. This involved writing a whole lot of boilerplate code for contructors and type predicates. I think in the long run it would be a good idea to write some syntax extensions to deal with data declarations – something that allows you to write something like (data jvalue (jnumber number?) (jnull jnull?) ...) which generates the structure definition, constructors and type predicates for you. I’m too lazy to bother for now, but I’ll probably add it to the prelude later.
The first step was to define the jvalue types – as I said I used define-structure for jvalue type. The simple constructors for jstring, jnull, jbool and jnumber are not too interesting but jobject and jarray need a little extra effort. I thought about using an assoc list but I plumped for a hash-table as the underlying storage for jobjects. As scheme isn’t statically typed, the type predicate jarray ought to be defensive and check that it’s elements are all jvalue types. Luckily define-structure automatically generates type predicates which help, so internally a jarray is a simple list of jvalue elements.
(define-struct jvalue (type data))
(define (jtype-eq? value jtype)
(and (jvalue? value)
(eq? jtype (jvalue-type value))))
(define (jobject? value)
(and (jtype-eq? value 'jobject)
(andmap (hash-map (λ (k v)
(and (string? k) (jvalue? v)))
(jvalue-data value)))))
(define (jobject value)
(if (hash? value)
(make-jvalue 'jobject value)
(error "jobject expects type hash : " value)))
(define (jarray? x)
(and (jtype-eq? x 'jarray)
(andmap jvalue? (jvalue-data x))))
(define (jarray value)
(if (list? value)
(make-jvalue 'jarray value)
(error "jobject expects type array : " value)))
The Doc data type and values
I took an identical approach to creating the doc data type – with the added need to deconstruct Union and Concat values with simple accessor procedures.
(define (concat? value) (doc-type=? value 'concat)) (define (concat l r) (make-doc 'concat (cons l r))) (define (concat-l c) (car (doc-data c))) (define (concat-r c) (cdr (doc-data c))) (define (union? value) (doc-type=? value 'union)) (define (union l r) (make-doc 'union (cons l r))) (define (union-l c) (car (doc-data c))) (define (union-r c) (cdr (doc-data c)))
There wasn’t much else of interest writing nest or fill in the general pretty-printing exercises – just a straightforward port of the haskell versions.
Pretty printing JSON unicode characters
The other points worth noting were dealing with the pretty-printing of json values. The full unicode spec is a bit over my head as are the various implementations and encodings – especially as the haskell functions introduce writing astral characters – I had to look those up.
The first thing I needed to do was to find the correct way of representing the various simple escape character literals. I couldn’t find them searching the PLT scheme documentation. Still it was easy enough for example to find the character using (string-ref “\f” 0) and so I eventually constructed an alist which I’m posting here for posterity and my own future reference.
(define simple-escapes
'((#\backspace . "\\b")
(#\newline . "\\n")
(#\page . "\\f")
(#\return . "\\r")
(#\tab . "\\t")
(#\\ . "\\\\")
(#\" . "\\\"")
(#\/ . "\\/")))
A couple of nice things I discovered were (at least in mzscheme) support for unicode hex literal characters and formatting #\uFF is the character literal for the ascii 255 character.
(format "~c" #\page) => "\f" (format "~c" #\uFF) => "ÿ" (format "~c" #\uFFFF) => "\uFFFF"
Unfortunately there seems to be disagreement about formatting astral characters – in PLT scheme they use the literal form of #\un where n looks to be a 64 bit value.
I’ve no idea why the haskell astral values are represented the way they are:
\uA a\uB : A = upper 10 bits + 0xd800 B = lower 10 bits + 0xdc00
Presumably it’s all there in the unicode spec. somewhere. So I decided to follow the haskell version but I’m not sure which is better or correct for JSON values. I guess it’s all there in the JSON spec too.