Please don’t hit me, Haskell does a great job of that already.
I love Haskell for the same reasons I love Dark Souls. Fantastic and inscrutable
lore, a great combat type system, a cliff-wall difficulty curve, and
unending punishment.
I want to collect some statistics from the GitHub API.
Step One - Stack
I download stack and start a project:
> cd /home/jack/programming && stack new github-stats && cd github-stats
Downloading template "new-template" to create project "github-stats" in github-stats/ ...
......
All done.
So far so good. Does it work?
> stack build && stack exec -- github-stats-exe
github-stats-0.1.0.0: configure
.....
Registering github-stats-0.1.0.0...
someFunc
Awww yisss. This is going to be so easy!
Step Two - HTTPS GET Request
Now I need to query the GitHub API. Not my first time to the rodeo, I generate a personal access token from GitHub and copy it to a local file. What query should I run first? How about the count for all ASM tetris repositories? Poking around the docs comes up with:
GET https://api.github.com/search/repositories?q=tetris+language:assembly&sort=stars&order=desc
User-Agent: steveshogren
Authorization: token PUT_TOKEN_HERE
{.. “total_count”: 354}
Easy life. Now how do you GET a resource in Haskell? Ah,
Network.HTTP!
I copy the front page sample into src/Lib.hs
module Lib
( someFunc
) where
x = simpleHTTP (getRequest "https://www.github.com/") >>= fmap (take 100) . getResponseBody
someFunc :: IO ()
someFunc =
print x
So simple! This is why laugh at my NodeJS loving friends!
> stack build
src/Lib.hs:5:5: Not in scope: ‘simpleHTTP’
src/Lib.hs:5:17: Not in scope: ‘getRequest’
src/Lib.hs:5:77: Not in scope: ‘getResponseBody’
Compilation failed.
Doesn’t compile. Durp, hackage is a package library, I need to add this to my cabal. What is the name of the package? HTTP-4000? HTTP-4000.3.2? Nothing in hackage seems to indicate what goes into the cabal file. I discover it is just HTTP through trial and error. I update my cabal file… in all three build-depends…?
build-depends: base >= 4.7 && < 5
, HTTP
Hrm, same error.
> stack build
src/Lib.hs:5:5: Not in scope: ‘simpleHTTP’
src/Lib.hs:5:17: Not in scope: ‘getRequest’
src/Lib.hs:5:77: Not in scope: ‘getResponseBody’
Compilation failed.
Oh, durp, I’d need an import. (WHY ISN’T THIS IN THE CODE SAMPLE?!) Also, print
doesn’t work, I need putStrLn
.
import Network.HTTP
x = simpleHTTP (getRequest "https://www.github.com/") >>= fmap (take 100) . getResponseBody
someFunc :: IO ()
someFunc = x >>= putStrLn
Here goes!!!
> stack build && stack exec -- github-stats-exe
github-stats-exe: user error (https not supported)
Wat. Further inspection of the docs shows a line WAAY DOWN in paragraph 5.
NOTE: This package only supports HTTP;
When playing Dark Soulsprogramming Haskell, sometimes the best move
is to run away. I search again. haskell https request
returns
“http-conduit” as the best choice. After adding http-conduit to my cabal, I come
up with this beast without any surprises:
query :: IO String
query = do
initReq <- parseUrl "https://api.github.com/search/repositories"
let r = initReq
{ method = "GET"
, requestHeaders = [(hUserAgent, "steveshogren")
, (hAuthorization, "token PUT_TOKEN_HERE")]}
let request = setQueryString [("q", Just "tetris+language:assembly")
,("order", Just "desc")
,("sort", Just "stars")] r
manager <- newManager tlsManagerSettings
res <- httpLbs request manager
return . show . responseBody $ res
someFunc :: IO ()
someFunc = do
query >>= putStrLn
Huzzah! Results! I’m getting back a monster string of json data.
“"{\"total_count\":66, ….}"
Step Three - Parsing JSON
Time to parse this mega JSON string. Aeson seems to be the biggest contender. To use Aeson and get the total_count value from the return, I needed the following additions:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics
import Data.Aeson
data ResultCount = ResultCount {
total_count :: Int }
deriving (Generic, Show)
instance ToJSON ResultCount
instance FromJSON ResultCount
ResultCount allows me to use decode
from aeson instead of show
to
parse the “total_count” from the JSON response into an Int. Sure enough, it
does!
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
module Lib
( someFunc
) where
import Control.Monad
import Network
import Network.HTTP.Conduit
import Network.HTTP.Types.Header
import GHC.Generics
import Data.Aeson
data ResultCount = ResultCount {
total_count :: Int }
deriving (Generic, Show)
instance ToJSON ResultCount
instance FromJSON ResultCount
query :: IO (Maybe Int)
query = do
initReq <- parseUrl "https://api.github.com/search/repositories"
let r = initReq
{ method = "GET"
, requestHeaders = [(hUserAgent, "steveshogren")
, (hAuthorization, "token PUT_TOKEN_HERE")]}
let request = setQueryString [("q", Just "tetris+language:assembly")
,("order", Just "desc")
,("sort", Just "stars")] r
manager <- newManager tlsManagerSettings
res <- httpLbs request manager
return . liftM total_count . decode . responseBody $ res
someFunc :: IO ()
someFunc = query >>= print
Puts out: Just 66
. Success! Wait. 66 isn’t the same count I got when
running from the browser. Check again. Sure enough, browser comes up with a
totally different count.
Maybe the query request isn’t correct? Adding a print request
on line 31
after building the request shows:
Request {
host = "api.github.com"
port = 443
secure = True
requestHeaders = [("User-Agent","steveshogren"),("Authorization","token PUT_TOKEN_HERE")]
path = "/search/repositories"
queryString = "?q=tetris%2Blanguage%3Aassembly&order=desc&sort=stars"
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = Just (-3425)
requestVersion = HTTP/1.1
}
The queryString isn’t right! It encoded my +
and :
! After an hour of
reading through docs and researching URL encoding specs, it dawns on me. +
is an encoded whitespace.
No face-palm gif could ever represent the shear magnitude of my current emotions… You’ll have to use your imagination
I change my query to "tetris language:assembly"
and the right
count comes back! Just 354
I finally have something that correctly fetches a count of repositories from
GitHub and parses it into an Int. After over four hours of Dark
SoulsHaskell punishment, we deserve to enjoy a bonfire!
Edit: Bonus Round!
Thanks to
Chris Allen
and
/u/JeanParker
for pointing me towards wreq, which weirdly
didn’t come up when I looked around for libs yesterday. Yep, it was 6th on the
Google when searching for haskell https get
. Network.HTTP
is the
top three results, and that doesn’t even do https.
¯\(ツ)/¯
Armed with their helpful suggestions, I knocked this out this morning.
import Network.Wreq
import Control.Lens
import Data.Aeson
import Data.Aeson.Lens
import qualified Data.Text as T
import qualified Data.ByteString.Char8 as BS
opts :: String -> String -> Options
opts lang token = defaults & param "q" .~ [T.pack $ "tetris language:" ++ lang]
& param "order" .~ ["desc"]
& param "sort" .~ ["stars"]
& header "Authorization" .~ [BS.pack $ "token " ++ token]
query lang = do
token <- readFile "token"
r <- getWith (opts lang token) "https://api.github.com/search/repositories"
return $ r ^? responseBody . key "total_count" . _Number
MUCH better. This includes reading my token from file called “token” so I don’t accidentally commit it. Also includes building up the different query options based on inputs, which was the next step. Thanks y’all.
Pixel gifs sourced from zedotagger on deviantart, thanks zedotagger!