My ruby external-encoding hack

Reading (text)-files from disk is very easy in Ruby.

content = IO.read( "filename.txt" )

When you are trying to do something with this content you can get in trouble.

content.split(",")  # => invalid byte sequence in UTF-8

I've setup my environment very nicely, so Ruby treats external files as UTF-8. The trouble begins when you are trying to handle files that are encoded in the ISO-8859-1 or CP-1252 format and Ruby thinks they are UTF-8.

To accept both UTF-8 and ISO-8858-? formats I've implemented the following hack:

  def convert_to_utf8(content)
    if content.valid_encoding?
      content
    else
      content.force_encoding("ISO-8859-1").encode("UTF-8")
    end    
  end

  # reading the content:
  content = convert_to_utf8 IO.read( "filename.txt" ) 

This hack works for me because the text-files I use are in one of those formats.

Ruby on Rails / ChiliProject encoding issues

This week I've decided to exchange Redmine for the ChiliProject. The reason for this is the support for Ruby 1.9. My Apache Passenger server runs Ruby 1.9 so for Redmine I needed a seperate webserver.

When I tried to access the "My Account" page I recieved the following error:

ArgumentError (invalid byte sequence in US-ASCII):
  <internal:prelude>:10:in `synchronize'
  passenger (3.0.7) lib/phusion_passenger/rack/request_handler.rb:96:in `process_request'
  passenger (3.0.7) lib/phusion_passenger/abstract_request_handler.rb:513:in `accept_and_process_next_request'
  passenger (3.0.7) lib/phusion_passenger/abstract_request_handler.rb:274:in `main_loop'
  passenger (3.0.7) ...
`handle_spawn_application'
  passenger (3.0.7) lib/phusion_passenger/abstract_server.rb:357:in `server_main_loop'
  passenger (3.0.7) lib/phusion_passenger/abstract_server.rb:206:in `start_synchronously'
  passenger (3.0.7) helper-scripts/passenger-spawn-server:99:in `<main>'

Rendering /data/www/rails/chili/public/500.html (500 Internal Server Error)

Solution

How should I solve this? The chiliproject has an issue related to this: https://www.chiliproject.org/issues/591.

The following Apache configuration fixed the issue: (The sample is on a FreeBSD system)

I added the following code to a file in the /usr/local/apache22/envvars.d/environment.env

export LC_CTYPE="en_US.UTF-8"

Problems I ruled out or fixed

While trying I also made sure the following things were configured:

I made sure the database is UTF-8. I re-created the database
an ran the migrations again.

create database chiliproject character set utf8;

I used the mysql2 connector instead of the mysql connector in database.yml