Ruby doesn’t like strings which are not UTF-8 encoded. CSV files are usually a bunch of data coming from somewhere and most of times are not UTF-8 encoded. When you try to read them you can expect to have problems. I fought against encoding problem for a long time and now I found how to avoid major problems and I’m very proud of this (because of many of headaches… :-/ ).

If you try to read a CSV file you can specify option :encoding to set source and destination encoding (format: “source:destination“) and pass it to the CSV engine already converted

CSV.foreach("file.csv", encoding: "iso-8859-1:UTF-8") do |row|
# use row here...
end

If you resource is not a file but a String or a file handler you need to covert it before use CSV engine. The standard String#force_encode method seems not working as expected:

a = "\xff"
a.force_encoding "utf-8"
a.valid_encoding?
# => returns false
a =~ /x/
# => provokes ArgumentError: invalid byte sequence in UTF-8

You must use String#encode! method to get things done:

a = "\xff"
a.encode!("utf-8", "utf-8", :invalid => :replace)
a.valid_encoding?
# => returns true now
a ~= /x/
# => works now

So using an external resource:

handler = open("http://www.example.com/file.csv")
csv_string = handler.read.encode!("UTF-8", "iso-8859-1", invalid: :replace)
CSV.parse(csv_string) do |row|
# use row here...
end

Sources: