OCRing in Style Jarmo Pertman, 03 Jan 2014

Recently, during our weekly Technology eXchangE (TeX) meeting, Erik organized a short Code Kata event. The problem we had to solve was "OCR". We had to parse input similar to this:

        _  _  _  _  _  _
|  |  ||_ |_ |_| _| _|  |
|  |  ||_||_| _||_ |_   |

 _     _  _  _  _  _  _  _
|_||_||_   ||_  _|  || |  |
|_|  ||_|  | _| _|  ||_|  |

 _     _     _  _     _  _
|_   ||_|  | _| _||_||_   |
|_|  ||_|  | _||_   | _|  |

Of course we did it in our regular pair-programming TDD fashion.

I decided to use Ruby as a language of choice to have more fun and also teach it to my partner in crime! Here is the result we produced during that hour (with tests of course!):

# ocr_spec.rb
require File.expand_path("ocr", File.dirname( __FILE__ ))


describe Ocr do
  context ".parse" do
    it "parses 1" do
      Ocr.parse(
"   
  |
  |
   ").should == 1
    end


    it "parses 2" do
      Ocr.parse(
" _
 _|
|_ 
   ").should == 2
    end


    it "parses 3" do
      Ocr.parse(
" _ 
 _|
 _|
   ").should == 3
    end


    it "parses 4" do
      Ocr.parse(
"   
|_|
  |
   ").should == 4
    end


    it "parses 5" do
      Ocr.parse(
" _ 
|_ 
 _|
   ").should == 5
    end


    it "parses 6" do
      Ocr.parse(
" _ 
|_ 
|_|
   ").should == 6
    end


    it "parses 7" do
      Ocr.parse(
" _ 
  |
  |
   ").should == 7
    end


    it "parses 8" do
      Ocr.parse(
" _ 
|_|
|_|
   ").should == 8
    end


    it "parses 9" do
      Ocr.parse(
" _ 
|_|
 _|
   ").should == 9
    end


    it "parses 0" do
      Ocr.parse(
" _ 
| |
|_|
   ").should == 0
    end
  end


  context ".parse_file" do
    it "single line" do
      Ocr.parse_file("test_file_single_line.txt").should == [111669227]
    end


    it "multiple line file" do
      Ocr.parse_file("test_file_multi_line.txt").should == [111669227, 846753707]
    end
  end  
end




# test_file_single_line.txt
          _  _  _  _  _  _
  |  |  ||_ |_ |_| _| _|  |
  |  |  ||_||_| _||_ |_   |


# test_file_multi_line.txt
          _  _  _  _  _  _
  |  |  ||_ |_ |_| _| _|  |
  |  |  ||_||_| _||_ |_   |

 _     _  _  _  _  _  _  _
|_||_||_   ||_  _|  || |  |
|_|  ||_|  | _| _|  ||_|  |
                           
                           


# ocr.rb
class Ocr
  MAPPINGS = [
    " _ | ||_| ",
    " | | ",
    " _ _||_ ",
    " _ _| _| ",
    " |_| | ",
    " _ |_ _| ",
    " _ |_ |_| ",
    " _ | | ",
    " _ |_||_| ",
    " _ |_| _| ",
  ]


  def self.parse(input)
    MAPPINGS.index input.split($/).join
  end


  def self.parse_file(file_name)
    File.readlines(file_name)
      .each_slice(4)
      .map {|four_lines| four_lines
                           .map {|line| line
                                          .chomp
                                          .chars
                                          .each_slice(3)
                                          .to_a
                           }
                             .transpose
                             .map {|number_string| parse number_string.map(&:join)
                                                     .join($/) }.join.to_i }
  end
end

The code above is definitely not production-ready! As you can see it is mostly one-liner. Good luck deciphering that! Here's some reference material to help you out: Array#index, String#split, Array#join, IO.readlines, Enumerable#each_slice, Enumerable#map, String#chomp, String#chars, Enumerable#to_a, Array#transpose and String#to_i