Customizing Input File Formats for Image Processing in Hadoop
By Jeff Conner
Arizona State University
Abstract:
This paper describes in a general sense how the Hadoop API can be
extended to deal with multiple file formats beyond ASCII text. This
technique is applied to binary image files in order to enable Hadoop to
implement image processing techniques on a large scale. Several image
processing algorithms are utilized to demonstrate this technique as
well as a few different approaches to image file segmentation, similar
to what is already done for ASCII text file segmentation.
Introduction:
Since its conception, Hadoop has traditionally been thought of as an
ASCII text file processing utility, however, given the nature of the
technology driving cloud computing and the growing development of
toolsets and techniques, it becomes useful to extend Hadoop to deal
with a wide variety of file types beyond ASCII text files. By
extending the current API in the Hadoop library we have built a system
that allows for large scale image analysis using any number of image
processing techniques. In this paper I will introduce the API changes
we made in order to allow Hadoop to handle images as well demonstrate
this technique using a few sample image processing algorithms.
From a cursory examination of the Hadoop API it appears that it was
originally designed to work with ASCII files. However, the API allows
for creation of custom input formats, by implementing the
“FileInputFormat” and “RecordReader” interfaces. Through these
interfaces, it becomes possible to define the way in which data is
segmented and sent to the Mapper for processing. We created custom
input formats to support each image processing algorithm we
implemented. We will introduce each algorithm and its custom input
format, and explain the implementation details of each one.
Excerpt:

Figure 1: Original input image for blob detection algorithm.
Figure 2: Product comparison of the serial implementation (left) versus
the Hadoop implementation with random coloring of the blobs (right).
Read the full paper by following the PDF download below:
| Attachment | Size |
|---|---|
| Customizing Input File Formats for Image Processing in Hadoop.pdf | 349.41 KB |
