Blog post cover illustration OCRing in Style
Jarmo Pertman
03 Jan 2014

OCRing in Style

Antud lugu on saadaval ainult inglise keeles

Recently, during our weekly Technology eXchangE (TeX) meeting, Erik organized a short Code Kata event. The problem we had to solve was ”OCR”. We had to parse input similar to this:

        _  _  _  _  _  _
|  |  ||_ |_ |_| _| _|  |
|  |  ||_||_| _||_ |_   |

 _     _  _  _  _  _  _  _
|_||_||_   ||_  _|  || |  |
|_|  ||_|  | _| _|  ||_|  |

 _     _     _  _     _  _
|_   ||_|  | _| _||_||_   |
|_|  ||_|  | _||_   | _|  |

Of course we did it in our regular pair-programming TDD fashion.

I decided to use Ruby as a language of choice to have more fun and also teach it to my partner in crime! Here is the result we produced during that hour (with tests of course!):


# ocr_spec.rb
require File.expand_path("ocr", File.dirname( __FILE__ ))


describe Ocr do
  context ".parse" do
    it "parses 1" do
      Ocr.parse(
"   
  |
  |
   ").should == 1
    end


    it "parses 2" do
      Ocr.parse(
" _
 _|
|_ 
   ").should == 2
    end


    it "parses 3" do
      Ocr.parse(
" _ 
 _|
 _|
   ").should == 3
    end


    it "parses 4" do
      Ocr.parse(
"   
|_|
  |
   ").should == 4
    end


    it "parses 5" do
      Ocr.parse(
" _ 
|_ 
 _|
   ").should == 5
    end


    it "parses 6" do
      Ocr.parse(
" _ 
|_ 
|_|
   ").should == 6
    end


    it "parses 7" do
      Ocr.parse(
" _ 
  |
  |
   ").should == 7
    end


    it "parses 8" do
      Ocr.parse(
" _ 
|_|
|_|
   ").should == 8
    end


    it "parses 9" do
      Ocr.parse(
" _ 
|_|
 _|
   ").should == 9
    end


    it "parses 0" do
      Ocr.parse(
" _ 
| |
|_|
   ").should == 0
    end
  end


  context ".parse_file" do
    it "single line" do
      Ocr.parse_file("test_file_single_line.txt").should == [111669227]
    end


    it "multiple line file" do
      Ocr.parse_file("test_file_multi_line.txt").should == [111669227, 846753707]
    end
  end  
end




# test_file_single_line.txt
          _  _  _  _  _  _
  |  |  ||_ |_ |_| _| _|  |
  |  |  ||_||_| _||_ |_   |


# test_file_multi_line.txt
          _  _  _  _  _  _
  |  |  ||_ |_ |_| _| _|  |
  |  |  ||_||_| _||_ |_   |

 _     _  _  _  _  _  _  _
|_||_||_   ||_  _|  || |  |
|_|  ||_|  | _| _|  ||_|  |
                           
                           


# ocr.rb
class Ocr
  MAPPINGS = [
    " _ | ||_| ",
    " | | ",
    " _ _||_ ",
    " _ _| _| ",
    " |_| | ",
    " _ |_ _| ",
    " _ |_ |_| ",
    " _ | | ",
    " _ |_||_| ",
    " _ |_| _| ",
  ]


  def self.parse(input)
    MAPPINGS.index input.split($/).join
  end


  def self.parse_file(file_name)
    File.readlines(file_name)
      .each_slice(4)
      .map {|four_lines| four_lines
                           .map {|line| line
                                          .chomp
                                          .chars
                                          .each_slice(3)
                                          .to_a
                           }
                             .transpose
                             .map {|number_string| parse number_string.map(&:join)
                                                     .join($/) }.join.to_i }
  end
end

The code above is definitely not production-ready! As you can see it is mostly one-liner. Good luck deciphering that! Here’s some reference material to help you out: Array#index, String#split, Array#join, IO.readlines, Enumerable#each_slice, Enumerable#map, String#chomp, String#chars, Enumerable#to_a, Array#transpose and String#to_i

Our recent stories