Finding Unseen Unicode/ASCII Characters in Ruby
This morning, I thought I was losing my mind. I'm writing a little web app (mostly Angular) that makes API calls. I know the API works, but for some reason, the calls from my app to the API were getting a 500 error in response. I tailed the API logs to see an "ArgumentError: argument out of range". However, the only thing that happened on this line was the date parsing. I open up the Rails console and start debugging. First I type out the date that isn't working. It works. Then I copy and paste from my browser. Failure.
irb(main):027:0> "2017-02-13T13:12:51Z".to_time(:utc) => 2017-02-13 13:12:51 UTC irb(main):028:0> "2017‑02‑13T13:12:51Z".to_time(:utc) ArgumentError: argument out of range
As you can see above, they look IDENTICAL. One of my coworkers suggested that I check the ASCII value of each character. Lucky for me, Ruby makes this easy.
"2017‑02‑13T13:12:51Z".each_byte do |c| puts c end ==> 50 48 49 55 226 128 145 48 50 226 128 145 49 51 84 49 51 58 49 50 58 53 49 90
If you look at a chart of ASCII characters and values, you can see that 127 is the end of the standard characters. My fifth character starts with 226. I know that the pattern of 226, 128, 145 repeats twice and in the same spot as the dash. Looking at a UTF-8 encoding table, I can see that set of characters represents the non-breaking hyphen, which is definitely breaking my API call. Mystery #1 of the morning? Solved.