Encoding, CSV and Ruby

Ruby doesn’t like strings which are not UTF-8 encoded. CSV files are usually a bunch of data coming from somewhere and most of times are not UTF-8 encoded. When you try to read them you can expect to have problems. I fought against encoding problem for a long time and now I found how to avoid major problems and I’m very proud of this (because of many of headaches… :-/ ).

If you try to read a CSV file you can specify option :encoding to set source and destination encoding (format: “source:destination“) and pass it to the CSV engine already converted

CSV.foreach("file.csv", encoding: "iso-8859-1:UTF-8") do |row|
# use row here...

If you resource is not a file but a String or a file handler you need to covert it before use CSV engine. The standard String#force_encode method seems not working as expected:

a = "\xff"
a.force_encoding "utf-8"
# => returns false
a =~ /x/
# => provokes ArgumentError: invalid byte sequence in UTF-8

You must use String#encode! method to get things done:

a = "\xff"
a.encode!("utf-8", "utf-8", :invalid => :replace)
# => returns true now
a ~= /x/
# => works now

So using an external resource:

handler = open("http://www.example.com/file.csv")
csv_string = handler.read.encode!("UTF-8", "iso-8859-1", invalid: :replace)
CSV.parse(csv_string) do |row|
# use row here...


  • Vermin

    “If you resource is not a file but a String or a file handler you need to covert it before use CSV engine” Is TempFile a file handler? Cuz I can’t use the :encoding option when i do CSV.new(open(http://..), …)

    • I never tested it but TempFile is probably similar to an external resource. So you probably need to convert it to a string (reading it) and then forcing the encoding. Let me know if works 😉

  • TomNext

    This really helped me out. Thanks for writing it up.

    • It’s a pleasure to know that my post was able save someone from another headache 🙂

  • littleforest

    Very helpful, thank you! I appreciate your thoroughness in going through the different cases of string, file, external resource.