Safari IT Books Language Constructs Functional Programming Haskell Safari IT Books Programming Programming Programming Bryan O'Sullivan Donald Bruce Stewart John Goerzen O'Reilly Media, Inc. Real World Haskell, 1st Edition 10.2. Parsing a Raw PGM File For our first try at a parsing function, we'll only worry about raw PGM files. We'll write our PGM parser as a pure function. It's won't be responsible for obtaining the data to parse, just for the actual parsing. This is a common approach in Haskell programs. By separating the reading of the data from what we subsequently do with it, we gain flexibility in where we take the data from. We'll use the ByteString type to store our graymap data, because it's compact. Since the header of a PGM file is ASCII text but its body is binary, we import both the text- and binary-oriented ByteString modules: -- file: ch10/PNM.hsimport qualified Data.ByteString.Lazy.Char8 as L8import qualified Data.ByteString.Lazy as Limport Data.Char (isSpace)
For our purposes, it doesn't matter whether we use a lazy or strict ByteString, so we've somewhat arbitrarily chosen the lazy kind. We'll use a straightforward data type to represent PGM images: -- file: ch10/PNM.hsdata Greymap = Greymap { greyWidth :: Int , greyHeight :: Int , greyMax :: Int , greyData :: L.ByteString } deriving (Eq)
Normally, a Haskell Show instance should produce a string representation that we can read back by calling read. However, for a bitmap graphics file, this would potentially produce huge text strings, for example, if we were to show a photo. For this reason, we're not going to let the compiler automatically derive a Show instance for us; we'll write our own and intentionally simplify it: -- file: ch10/PNM.hsinstance Show Greymap where show (Greymap w h m _) = "Greymap " ++ show w ++ "x" ++ show h ++ " " ++ show m
Because our Show instance intentionally avoids printing the bitmap data, there's no point in writing a Read instance, as we can't reconstruct a valid Greymap from the result of show. Here's an obvious type for our parsing function: -- file: ch10/PNM.hsparseP5 :: L.ByteString -> Maybe (Greymap, L.ByteString)
This will take a ByteString, and if the parse succeeds, it will return a single parsed Greymap, along with the string that remains after parsing. That residual string will be available for future parses. Our parsing function has to consume a little bit of its input at a time. First, we need to assure ourselves that we're really looking at a raw PGM file; then we need to parse the numbers from the remainder of the header; and then we consume the bitmap data. Here's an obvious way to express this, which we will use as a base for later improvements : -- file: ch10/PNM.hsmatchHeader :: L.ByteString -> L.ByteString -> Maybe L.ByteString-- "nat" here is short for "natural number"getNat :: L.ByteString -> Maybe (Int, L.ByteString)getBytes :: Int -> L.ByteString -> Maybe (L.ByteString, L.ByteString)parseP5 s = case matchHeader (L8.pack "P5") s of Nothing -> Nothing Just s1 -> case getNat s1 of Nothing -> Nothing Just (width, s2) -> case getNat (L8.dropWhile isSpace s2) of Nothing -> Nothing Just (height, s3) -> case getNat (L8.dropWhile isSpace s3) of Nothing -> Nothing Just (maxGrey, s4) | maxGrey > 255 -> Nothing | otherwise -> case getBytes 1 s4 of Nothing -> Nothing Just (_, s5) -> case getBytes (width * height) s5 of Nothing -> Nothing Just (bitmap, s6) -> Just (Greymap width height maxGrey bitmap, s6)
This is a very literal piece of code, performing all of the parsing in one long staircase of case expressions. Each function returns the residual ByteString left over after it has consumed all it needs from its input string. We pass each residual string along to the next step. We deconstruct each result in turn, either returning Nothing if the parsing step fails, or building up a piece of the final result as we proceed. Here are the bodies of the functions that we apply during parsing (their types are commented out because we already presented them): |