prep4PhotographicMosaic(threaded).pl
Photographic Mosaics: Part I - Threaded Tile Generator

Introduction

Creating a photographic mosaic involves replacing every section of a reference image with scaled-down versions of other photos. The replacement images constitute the tiles of the mosaic. Which tiles are best suited will depend on some criterion. A simple rule is to choose those tiles that minimize the deviation between their mean RGB values and those of the corresponding sections of the reference image. Though characterizing a best match with only two triplets of numbers is rather coarse, the rule proves to be effective.
Creating the tiles is however a computationally intensive process. It is clearly best to isolate this task from that of creating the actual mosaic. The undertaking can be mitigated using a multi-core computer, with or without hyper-threading, and an appropriate operating system. Such a platform is then capable of creating the tiles in parallel, thus reducing the computational time significantly.
This article discusses a parallel-processing program that can be implemented on even the humblest of today's desktops. Though its primary purpose is of course tile production, the program also generates an animated 3D scatter plot of the tile values. The plot provides a quick visual means of assessing the tile coverage in RGB space. We will then describe an XML-based input scheme for specifying the various required program parameters. Finally, we will present benchmark results in which a dual-core, non-hyperthreading platform is pitted against a faster single core, hyperthreading one.

Threads

The algorithm presented here meets the following basic goals. Given a set of photographic images: (a) scale each image to some specified width and height dimensions, (b) write the resulting tile file to disk, (c) compute the tile's mean normalized red, green and blue intensities of all its pixels, and (d) store these averages in an unordered flat-file database. Also recorded are the tile's filename and a reference to the source image collection so that the latter may be invoked separately when creating a photographic mosaic. Finally, repeat all the aforementioned steps for every image set specified.
Concurrent processing is at the core of the algorithm's implementation where a simple parallel pipeline model is used. The fundamental tasks are assigned to various threads. The results and the inter-thread signals are conveyed by means of thread queues. Thread #0, i.e., the main program, is responsible for finding all the images in all the specified sets and listing their filepaths to a first thread queue. Multiple secondary threads each retrieve the filepaths from the queue as soon as they become available. The tiles are created by scaling copies of the source images. Moreover, these tile-creator threads compute the mean normalized RGB values for each tile. In turn, they report their computational results to a common second thread queue. These latter values are concurrently dequeued by one last additional thread that writes them to the database. The records are written on a first-come, first-served basis which normally results in an unordered data set. Finally, note that all the secondary threads terminate "gracefully" upon being informed by an upstream thread, also via the thread queues, that no more data is forthcoming. The whole aforementioned process can be illustrated by the following conceptual Petri Net where the repetition arcs have been omitted:
Keep in mind that threads are also used by operating systems, such as Windows, to carry out their basic processes. In order to alleviate thread starvation, the OS schedules segmented execution of all the threads amongst the various cores. Accordingly, one usually doesn't know which thread is currently being executed and on which CPU. On the one hand, the more cores a system has, the more threads can be executed simultaneously. But, on the other, if there are too many threads active, the responsiveness of the system will suffer. Another major factor that comes into play is the I/O bottleneck that ensues from the various threads wanting to access the hard drive(s) simultaneously. Consequently, there is an optimal number of secondary threads for our algorithm that will depend on the platform it is running on.

3D Scatter Plot

Experience has shown us that it is worthwhile having an appreciation of how the resulting tile values are distributed in RGB space. This is especially true for "small" sets. If the RGB values are close to being uniformly distributed, then a mosaic program will be able to match from a varied selection of tiles at all times. In turn, the resulting mosaic will be made up of many different tiles, creating a more intriguing and therefore more pleasing visual effect. If, on the other hand, the values are strongly clustered, then a small number of outliers are favoured to be chosen repeatedly when matching anywhere in their vicinity. Consequently, large blocks made up of the same tile can appear in a mosaic and hence fail to convey the color variations in the reference image.
Thus, in order to visualize the resulting tile distribution, the main thread's final major task is to output a 3D scatter plot of the mean normalized RGB tile values. It is rendered as an animated GIF with the gridded plot shown rotating about a vertical axis through 360 degrees: the data points are represented by small crosses. It is created with the aid of the external application gnuplot (v4.4 or later). Compared to the tile creation process, the required computational effort is not significant: the format of the tile data file was chosen to be compatible.
As a further visual reference, the plot includes the grayscale line. RGB grayscale values are points on the (r,g,b) line given by r = g = b. Thus, in the normalized RGB space, black corresponds to r = g = b = 0, white to r = g = b = 1 and the remaining shades of gray to the intermediate values in the open interval (0,1).
The scatter plot for our current collection of 3,111 16x16 px tiles is as follows:
It clearly shows a strong clustering about an inclined plane. Can't help thinking that the plot looks like a fish skewered from head to tail! Anyway, matching near the sides of the RGB cube will be strongly biased in favour of the outliers. For example, there are absolutely no tiles in the immediate vicinity of the point for the color gold (r=1, g=0.843137255, b= 0). We are thus forewarned that not all subject images can be made into pleasing mosaics with our current tile collection.
Individual frames can be readily viewed/extracted by many free image editing programs, such as GIMP and IrfanView.

XML Input

The algorithm's implementation can be viewed as an assembly line with parallel work stations. And, as with all such divisions of labour, each worker requires a clear set of instructions. However, writing each instruction as an argument on a program's command line would not be practical. What is needed instead is a manual of job specifications. In our case, this manual takes the form of an XML file. It is a straightforward matter to create/edit such a file with any text-processing application. Moreover, it has the benefit of documenting what was produced and how it was done. The location of this manual will then be the sole argument.
To explain the required XML tags, let's examine the input file tiles_16x16.xml ( Download ) which governed the creation of our 16x16 px tile set. Note that the filepaths are all specified in a Unix style.
XML file "tiles_16x16.xml" Remarks
<prep4PhotographicMosaic noThreads=""> Specifies the document root node as "prep4PhotographicMosaic" along with the number of tile-creator threads to use: the additional thread that updates the database is always a given. When not specified, as is the case here, the program will default to the number of processors reported in the environment. This latter value includes both real and logical processors.
<images collection="plants" topDir="_images/ABB_plants" /> Declares the first of eleven image sets to process. The attribute "collection" provides the reference name for the set. The attribute "topDir" specifies the starting directory in which the program will attempt to locate recognized image files. It will recurse through all the associated daughter directories.
<images collection="paris" topDir="_images/ABB_paris" />
<images collection="paris" topDir="_images/FR_paris" /> Declares the third image set to process. Note that it carries the same collection name as the previous set. Consequently, invoking "paris" when creating a photographic mosaic will bring these and the previous tiles into play.
<images collection="hong kong" topDir="_images/KF_Hong_Kong" />
<images collection="london" topDir="_images/KF_London" />
<images collection="macau" topDir="_images/KF_Macau" />
<images collection="philadelphia" topDir="_images/KF_Philadelphia" />
<images collection="samples" topDir="_images/misc/samples" />
<images collection="monochromatic" topDir="_images/monochromatic" />
<images collection="sky" topDir="_images/sky" />
<images collection="space" topDir="_images/space" /> Declares the last image set. One can pretty much have as many of these declarations as your computing resources can support.
  Blank lines are optional & ignored.
<tiles width="16" height="16" dir="_tiles/tiles_16x16" /> Specifies the rectangular pixel dimensions for the tiles. Moreover, it names the target folder in which the tiles are to be written along with the associated database and scatter plot files. Note that the designated directory, along with all intermediate ones, will be created if they do not exist.
   
<plot width="" height="" delay="60" twirlIncrement="6"> The daughter root node for the scatter plot. The first attribute specifies the pixel width of the plot. Here, the default value of 500 is implicitly accepted. Similarly, the height of the plot is being set to the default value of 500 pixels also. Next, the time delay between frames in the animated GIF is being declared as 60 hundreds of a second: the default value is 1 second, that is, 100 hundreds of a second. Finally, the rotation increment per frame is defined as 6 degrees with the default value being 10. Note that, for a smooth animation, this angle should be a small divisor of 360.
<gnuplot>C:/Program Files/gnuplot/binary/gnuplot.exe</gnuplot> The file specification of the main gnuplot executable. If, on the other hand, its path is already included in your environment's path declaration, then this entry can be left blank, viz., <gnuplot></gnuplot> or, more simply. <gnuplot />.
</plot> Terminates the declarations for the scatter plot.
</prep4PhotographicMosaic> end of the XML declarations
N.B. 1: Given that no XSLT style sheet is associated with the XML file, any or all of the tag attributes may also be expressed as actual tags. For example, instead of
<images collection="space" topDir="_images/space" />
one could write
<images>
<collection>space</collection>
<topDir>_images/space</topDir>
</images>
Moreover, the reverse is also true, as in
<plot width="" height="" delay="60" twirlIncrement="6" gnuplot=""></plot>
Finally note that the order of the tag attributes is not important. Thus the previous declaration is equivalent to
<plot gnuplot="" width="" twirlIncrement="6" delay="60" height="" />
N.B. 2: Once a tile has been created, the corresponding source image file will have its "read-only" attribute set. Conversely, all image files marked as "read-only" will not be processed. In consequence, upon adding new images to the folders of previously processed image sets, one only needs to re-run the program with the original XML declaration(s) in order to have the new and only the new images made into tiles.
N.B. 3: A scatter plot of all the tiles is always drawn whether new tiles have been created or not. Therefore one can repeatedly run the program in order to experiment with just the plot parameters.

Code Listing

The above scheme was implemented on a Windows XP Pro platform using ActivePerl v5.10.1 (build 1007 for MSWin32-x86-multi-thread), with the imports for threads and thread queues, along with ImageMagick's drawing primitives accessed through its PerlMagick interface. The content of the XML file is read in with the aid of the modules XML::Simple and XML::Parser. And, as previously mentioned, the scatter plot requires the installation of gnuplot v4.4 or latter.
The resulting script, prep4PhotographicMosaic(threaded).pl ( Download - SHA1 Checksum: 173adb07ff64728f76a0821735f5030842fb6728 ) is released for personal, non-commercial and non-profit use only.
The listing includes the line numbers in order to reference them in the general remarks in the next section.
001 #ActivePerl v5.10.1, build 1007 for MSWin32-x86-multi-thread
002 #===== Copyright 2010, Webpraxis Consulting Ltd. - ALL RIGHTS RESERVED - Email: webpraxis@gmail.com ============================
003 # prep4PhotographicMosaic(threaded).pl: Threaded photographic mosaic tile generator.
004 #===============================================================================================================================
005 #           Usage : perl prep4PhotographicMosaic(threaded).pl XmlFile
006 #       Arguments : XmlFile = path of the XML file detailing the processing specifications.
007 #     Input Files : Designated image files, with extensions .bmp, .gif, .jpeg, .jpg, .png or .tif (case insensitive).
008 #    Output Files : Written to the designated tile subdirectory (it will be created if it does not exist), are
009 #                    - the GIF tile files, in the form "tile_k.gif" where "k" denotes a positive integer code.
010 #                    - "_tiles.dat", a TSV file for each tile's filename, associated image collection name and
011 #                      mean normalized RGB values.
012 #                    - "_tiles(m).dat", a tentative backup copy of a previous "_tiles.dat" where "m" is the smallest
013 #                      positive integer that ensured a non-conflicting filename. It will be retained only if it differs
014 #                      with "_tiles.dat".
015 #                    - "_scatterPlot.gif", an animated rotating 3D scatter plot of the mean normalized RGB tile values.
016 #                    - "_scatterPlot(n).gif", a tentative backup copy of a previous "_scatterPlot.gif" where "n" is the
017 #                      smallest positive integer that ensured a non-conflicting filename. It will be retained only if it
018 #                      differs with "_scatterPlot.gif".
019 # Temporary Files : "_scatterPlot.rot", a text file containing the required rotation commands for the gnuplot application.
020 #                   It will be created in one of the temporary directories reported in the environment. If none are available,
021 #                   then the current directory will be used. Moreover, the file will be deleted upon normal termination.
022 #         Remarks : See http://www.webpraxis.ab.ca/photo_mosaic_v2/prep4PhotographicMosaic(threaded).php for details.
023 #         History : v2.0.0: October 1, 2010 - Original release.
024 #===============================================================================================================================
025 # 0) INITIALIZE:
026 use strict;                                                                                 #pragma to restrict unsafe constructs
027 use warnings;                                                                               #pragma for all optional warnings
028 use Time::HiRes qw(gettimeofday tv_interval);                                               #high resolution interval timer, v1.9721
029 use File::Basename;                                                                         #core module to parse file specifications
030 use File::Compare;                                                                          #core module to compare files
031 use File::Copy;                                                                             #core module to copy files
032 use File::Find;                                                                             #core module to traverse a directory tree
033 use File::Path qw(make_path);                                                               #core module to create directory trees, v2.08
034 use Image::Magick;                                                                          #PerlMagick from ImageMagick-6.6.3-4-Q16 Win package
035 use Term::ANSIScreen qw/:color :cursor :screen/;                                            #terminal control using ANSI escape sequences, v1.42
036 use Win32::Console::ANSI qw/ Cursor /;                                                      #emulate ANSI console on Win32 system, v1.04
037 use Win32::File qw(:DEFAULT GetAttributes SetAttributes);                                   #read & set attributes of MS files, v0.06
038 use threads;                                                                                #pragma for interpreter-based threads, v1.79
039 use threads::shared;                                                                        #pragma for sharing data structures between threads, v1.33
040 use Thread::Queue;                                                                          #core module for thread-safe queues, v2.11
041 use XML::Simple;                                                                            #module to maintain XML files, v2.18
042     $XML::Simple::PREFERRED_PARSER = 'XML::Parser';                                         #set module to parse XML documents, v2.36-r1
043 use constant FATAL => colored ['white on red'],"\n\n\aFATAL ERROR:",colored ['reset'], ' '; #intro msg when reporting a fatal error
044 
045 my $StartTime       = [gettimeofday];                                                       #record start of execution time
046 
047 $| = 1;                                                                                     #set STDOUT buffer to auto-flush
048 cls();                                                                                      #clear screen
049 print colored ['black on white'], "$0\n\n\n";                                               #display program name
050 
051 ( my $XmlFile       = shift ) =~ tr#\\#/#;                                                  #get path of XML file (Unix style)
052 die FATAL, "Cannot locate the XML file '$XmlFile' -" unless -e $XmlFile;                    #check its existence
053 
054 my $Specs           = XMLin($XmlFile, forcearray => [qw(images)], SuppressEmpty => undef);  #read the XML data file for the specifications
055 
056 my $N               = $$Specs{noThreads} || $ENV{NUMBER_OF_PROCESSORS};                     #parameterize number of tile-creator threads
057 die FATAL, 'Invalid number of tile-creator threads -' unless $N > 0;                        #check that there's at least one thread
058 
059 my @ImageSets;                                                                              #parameterize the image sets
060 push @ImageSets, map{ \%$_ } @{ $$Specs{images} };                                          #deep copy the info
061 die FATAL, 'No image set(s) defined -' unless @ImageSets;                                   #check that there's at least one set
062 
063 my $TileWidth       = $$Specs{tiles}{width};                                                #parameterize the tile width
064 my $TileHeight      = $$Specs{tiles}{height};                                               #parameterize the tile height
065 my $Geometry        = "${TileWidth}x${TileHeight}";                                         #geometry of a tile: width, height
066 die FATAL, 'Invalid tile width specified -' unless $TileWidth > 0;                          #check for positive value
067 die FATAL, 'Invalid tile height specified -' unless $TileHeight > 0;                        #check for positive value
068 
069 ( my $TmpDir        = $ENV{TMP} || $ENV{TEMP} || '.' )  =~ tr#\\#/#;                        #define the temporary directory (Unix style)
070 ( my $TileDir       = $$Specs{tiles}{dir} )             =~ tr#\\#/#;                        #parameterize the tile directory (Unix style)
071 my $TileData        = "$TileDir/_tiles.dat";                                                #set the filepath for the tile data
072 die FATAL, "Cannot create tile directory '$TileDir': $! -"                                  #check existence of target tile directory:
073  unless ( -d $TileDir ) or make_path( $TileDir, { verbose => 0, mode => 0600 } );           # or create it if need be
074 
075 my $Gnuplot         = $$Specs{plot}{gnuplot}        || 'gnuplot.exe';                       #parameterize path of gnuplot executable
076 my $PlotDelay       = $$Specs{plot}{delay}          || 100;                                 #parameterize the animation frame delay
077 my $PlotHeight      = $$Specs{plot}{height}         || 500;                                 #parameterize the plot height
078 my $PlotWidth       = $$Specs{plot}{width}          || 500;                                 #parameterize the plot width
079 my $PlotTwirl       = $$Specs{plot}{twirlIncrement} || 10;                                  #parameterize the plot rotation increment
080 my $PlotFile        = "$TileDir/_scatterPlot.gif";                                          #define the scatter plot file
081 my $PlotRotation    = "$TmpDir/_scatterPlot.rot";                                           #define the plot rotation command file
082 die FATAL, 'Invalid animation frame delay specified -' unless $PlotDelay >= 0;              #check for non-negative value
083 die FATAL, 'Invalid plot height specified -' unless $PlotHeight > 0;                        #check for positive value
084 die FATAL, 'Invalid plot width specified -' unless $PlotWidth > 0;                          #check for positive value
085 die FATAL, 'Invalid plot rotation increment specified -' unless $PlotTwirl > 0;             #check for positive value
086 
087 my @Threads;                                                                                #thread description:
088 push @Threads, "#0 = main";                                                                 # main program
089 push @Threads, "#$_ = tile creator" for ( 1..$N );                                          # tile-creator threads
090 push @Threads, "#" . ( $N + 1 ) . " = tile-data updater";                                   # tile-data updater thread
091 
092 print                                                                                       #echo initialization results
093         colored ['reset'], 'XML input file   : ', colored ['bold white'], $XmlFile, "\n\n",
094         colored ['reset'], 'No of Image Sets : ', colored ['bold white'], scalar @ImageSets, "\n\n",
095         colored ['reset'], 'Tile Width       : ', colored ['bold white'], $TileWidth, " px\n",
096         colored ['reset'], 'Tile Height      : ', colored ['bold white'], $TileHeight, " px\n",
097         colored ['reset'], 'Tile Directory   : ', colored ['bold white'], $TileDir, "\n",
098         colored ['reset'], 'Tile Data        : ', colored ['bold white'], $TileData, "\n\n",
099         colored ['reset'], 'Scatter Plot     : ', colored ['bold white'], $PlotFile, "\n",
100         colored ['reset'], ' - Width         : ', colored ['bold white'], $PlotWidth, " px\n",
101         colored ['reset'], ' - Height        : ', colored ['bold white'], $PlotHeight, " px\n",
102         colored ['reset'], ' - Frame Delay   : ', colored ['bold white'], $PlotDelay, " hundreds of a second\n",
103         colored ['reset'], ' - Rotation Inc. : ', colored ['bold white'], $PlotTwirl, " degree(s)\n\n",
104         colored ['reset'], 'Threads          : ', colored ['bold white'], join( ",\n" . ' ' x 19, @Threads ), "\n\n",
105         colored ['reset'], '-' x 80, "\n\n";
106 
107 my $CursorY         = ( Cursor() )[1];                                                      #record cursor row position
108 #-------------------------------------------------------------------------------------------------------------------------------
109 # 1) CREATE THE TILES FOR EACH SPECIFIED IMAGE SET:
110 my $Collection;                                                                             #image collection name
111 my $NoImages                = 0;                                                            #number of images processed
112 my $NoBytes                 = 0;                                                            #number of bytes processed
113 my $NoThreadsDone:shared    = 0;                                                            #number of tile-creator threads that have ended
114 my $TileCode:shared         = &initTileCode();                                              #init tile code value
115 my $ImageQueue              = Thread::Queue->new();                                         #queue for the image data
116 my $TileQueue               = Thread::Queue->new();                                         #queue for the tile data
117 my %FindOptions             = ( wanted      => \&enqueueImage,                              #reference to callback function for enqueueing image data
118                                 no_chdir    => 1                                            #suppress directory changes on recurse
119                               );
120 
121 threads->create( \&makeTiles )->detach() for ( 1..$N );                                     #launch thread(s) for creating tiles
122 threads->create( \&updateTileData );                                                        #launch thread for updating the tile data
123 
124 &report( 0, 'bold green', 'Populating image queue...' );                                    #report processing step
125 for( @ImageSets ) {                                                                         #repeat for each image set
126     $Collection = $$_{collection};                                                          # parameterize the image collection name
127     die FATAL, "Invalid collection name '$Collection': $! -" unless $Collection;            # check existence of collection name
128     die FATAL, "No such directory '$$_{topDir}': $! -" unless -d $$_{topDir};               # check existence of source image directory
129     find( \%FindOptions, $$_{topDir} );                                                     # populate image queue
130 }                                                                                           #until all image sets processed
131 $ImageQueue->enqueue( map{ undef } 1..3*$N );                                               #signal creator threads that no more images forthcoming
132 &report( 0, 'bold yellow', 'Waiting...' );                                                  #report wait state
133 $_->join() for threads->list(threads::running);                                             #wait for data-updater thread to end
134 my $Elapsed = tv_interval( $StartTime );                                                    #determine total elapsed time for tile creation
135 #-------------------------------------------------------------------------------------------------------------------------------
136 # 2) CREATE ANIMATED 3D SCATTER PLOT OF TILE RGB VALUES:
137 &report( 0, 'bold green', 'Creating 3D scatter-plot animation...' );
138 &animScatterPlot();
139 &report( 0, 'bold red', 'Done' );                                                           #report termination
140 #-------------------------------------------------------------------------------------------------------------------------------
141 # 3) REPORT PROCESSING STATISTICS:
142 print locate( $CursorY + $N + 3, 1 ),
143       colored ['bold white'], "\a", $NoImages,
144       colored ['reset'], ' images / ',
145       colored ['bold white'], &niceSize( $NoBytes, 2 ),
146       colored ['reset'], ' processed in ',
147       colored ['bold white'], $Elapsed,
148       colored ['reset'], " secs\nThroughput: ",
149       colored ['bold white'], &niceSize( $NoBytes/$Elapsed, 2 ), "/sec\n";
150 exit;                                                                                       #end processing
151 #===== SUBROUTINES =============================================================================================================
152 #            Usage : &enqueueImage()
153 #          Purpose : Processes the result of File::Find. For each file recognized as an image type and not set as "read-only",
154 #                    it enqueues the corresponding path, attributes and image collection name for the tile-creator thread(s)
155 #                    to process.
156 #        Arguments : $_ = file path
157 #          Returns : None.
158 #  Externals -  In : $Collection, $NoBytes, $NoImages
159 #  Externals - Out : $NoBytes, $NoImages
160 #     Shared -  In : None.
161 #     Shared - Out : None.
162 #           Queues : $ImageQueue
163 #             Subs : None.
164 #          Remarks : None.
165 #          History : v2.0.0 - October 1, 2010 - Original release.
166 
167 sub enqueueImage {                                                                          #begin sub
168     my $attrbs;                                                                             # file attributes
169 
170     return unless /\.(bmp|gif|jpeg|jpg|png|tif)$/i;                                         # return unless dealing with a recognized image file
171     GetAttributes( $_, $attrbs );                                                           # get the file attributes
172     return if $attrbs & READONLY;                                                           # return if "read-only" attribute set
173     ++$NoImages;                                                                            # increment image counter
174     $NoBytes += -s $_;                                                                      # increment byte count
175     $ImageQueue->enqueue( $_, $attrbs, $Collection );                                       # enqueue file path, attributes & collection name
176 }                                                                                           #end sub enqueueImage
177 #-------------------------------------------------------------------------------------------------------------------------------
178 #            Usage : &fileBackup( Source )
179 #          Purpose : Copies the designated source file to a target file with a similar file specification. The target's basename
180 #                    will differ by having a positive integer, enclosed in parentheses, appended. The chosen integer will be the
181 #                    smallest possible value that ensures a non-conflicting filename.
182 #        Arguments : Source = Source file specification.
183 #          Returns : String containing the target file specification or the undefined value.
184 #  Externals -  In : FATAL.
185 #  Externals - Out : None.
186 #     Shared -  In : None.
187 #     Shared - Out : None.
188 #           Queues : None.
189 #             Subs : None.
190 #          Remarks : The undefined value is returned if the source file does not exist.
191 #          History : v1.0.0 - October 1, 2010 - Original release.
192 
193 sub fileBackup {                                                                            #begin sub
194     my $source      = shift;                                                                # parameterize the source file specification
195     my $target      = $source;                                                              # init backup file specification
196     my $modifier    = 0;                                                                    # init basename modifier
197 
198     return undef unless -e $source;                                                         # return undefined value if source does not exist
199 
200     fileparse_set_fstype('');                                                               # force Unix-style interpretation
201     my ( $basename, $path, $suffix ) = fileparse( $source, '\..*' );                        # split the source file spec into components
202     $target = "$path${basename}(" . ++$modifier . ')' . $suffix while -e $target;           # compose unique filename
203 
204     copy( $source, $target )                                                                # copy source file to target
205      or die FATAL, "Cannot backup the file '$source' to '$target': $! -";
206     return $target;                                                                         # return the target file specification
207 }                                                                                           #end sub fileBackup
208 #-------------------------------------------------------------------------------------------------------------------------------
209 #            Usage : &initTileCode()
210 #          Purpose : Returns a starting value for the tile code.
211 #        Arguments : None.
212 #          Returns : Integer scalar.
213 #  Externals -  In : FATAL, $TileDir
214 #  Externals - Out : None.
215 #     Shared -  In : None.
216 #     Shared - Out : None.
217 #           Queues : None.
218 #             Subs : None.
219 #          Remarks : Scans the tile directory to find the greatest of the code values used in the tile basenames.
220 #          History : v1.0.0 - October 1, 2010 - Original release.
221 
222 sub initTileCode {                                                                          #begin sub
223     my $code = 0;                                                                           # greatest code value
224 
225     local *TILES;                                                                           # handle for the tile directory
226 
227     opendir( TILES, $TileDir )                                                              # open the tile directory for read
228      or die FATAL, "Cannot open the tile directory '$TileDir' for read: $! -";
229     while( $_ = readdir( TILES ) ) {                                                        # repeat for every entry
230         next unless /^tile_(.+)\.gif$/;                                                     #  skip entry if not a tile
231         $code = $1 if $1 > $code;                                                           #  update greatest code value
232     }                                                                                       # until all directory entries processed
233     closedir( TILES );                                                                      # close the tile directory
234     return ++$code;                                                                         # return starting value
235 }                                                                                           #end sub initTileCode
236 #-------------------------------------------------------------------------------------------------------------------------------
237 #            Usage : &niceSize( ByteCount {,Precision} )
238 #          Purpose : Formats a byte count as a "human-readable" string.
239 #        Arguments : ByteCount = number of bytes
240 #                    Precision = optional number of decimal places
241 #          Returns : "Human-readable" bytecount string.
242 #  Externals -  In : None.
243 #  Externals - Out : None.
244 #     Shared -  In : None.
245 #     Shared - Out : None.
246 #           Queues : None.
247 #             Subs : None.
248 #          Remarks : Original code sourced from http://www.jb.man.ac.uk/~slowe/perl/filesize.html.
249 #                    Corrected and modified for IEC binary prefixes.
250 #          History : v1.0.0 - October 1, 2010 - Original release.
251 
252 sub niceSize {                                                                              #begin sub
253     my $size    = shift;                                                                    # parameterize the byte count
254     my $ndp     = shift() || 0;                                                             # parameterize the number of decimal places
255     my @units   = ( 'bytes', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB' );      # define the binary units
256     my $unitIdx = 0;                                                                        # init the units array index
257     my $dp      = 10**$ndp;                                                                 # init the precision factor
258 
259     while( $size > 1024 ) {                                                                 # repeat while byte count exceeds 2**10
260         $size /= 1024;                                                                      #  divide byte count by 2**10 to get next unit
261         $unitIdx++;                                                                         #  set corresponding unit term
262     }                                                                                       # until byte count appropriately factored
263     return ( $units[$unitIdx] ) ? ( int( $size*$dp ) / $dp )  . " " . $units[$unitIdx]      # return human-readable string
264                                 : $_[0];                                                    # or original byte count if it exceeds 1024 YiB!
265 }                                                                                           #end sub niceSize
266 #-------------------------------------------------------------------------------------------------------------------------------
267 #            Usage : &report( ThreadId, Colors, Msg );
268 #          Purpose : Reports a thread's processing status.
269 #        Arguments : ThreadId = thread integer id
270 #                    Colors   = color attribute of message
271 #                    Msg      = message to be displayed
272 #          Returns : None.
273 #  Externals -  In : $CursorY
274 #  Externals - Out : None.
275 #     Shared -  In : None.
276 #     Shared - Out : None.
277 #           Queues : None.
278 #             Subs : None.
279 #          Remarks : None.
280 #          History : v2.0.0 - October 1, 2010 - Original release.
281 
282 sub report {                                                                                #begin sub
283     my( $threadId, $colors, $msg ) = @_;                                                    # parameterize the arguments
284 
285     print locate( $CursorY + $threadId, 1 ), clline,                                        # clear line coresponding to thread
286           colored [ 'reset' ], "Thread #$threadId: ",                                       # display heading
287           colored [ $colors ], $msg;                                                        # display message
288 }                                                                                           #end sub report
289 #-------------------------------------------------------------------------------------------------------------------------------
290 #            Usage : &animScatterPlot();
291 #          Purpose : Creates an animated GIF of a rotating 3D scatter plot of the mean normalized RGB tile values.
292 #        Arguments : None.
293 #          Returns : None.
294 #  Externals -  In : FATAL, $Gnuplot, $PlotDelay, $PlotFile, $PlotHeight, $PlotRotation, $PlotTwirl, $PlotWidth, $TileData
295 #  Externals - Out : None.
296 #     Shared -  In : None.
297 #     Shared - Out : None.
298 #           Queues : None.
299 #             Subs : fileBackup
300 #          Remarks : Based on the Gnuplot demonstration code "animate.dem" and "gnuplot.rot".
301 #          History : v1.0.0 - October 1, 2010 - Original release.
302 
303 sub animScatterPlot {                                                                       #begin sub
304     my $backup      = &fileBackup( $PlotFile );                                             # tentatively backup any previous scatter plot
305     my $noFrames    = int( 360 / $PlotTwirl );                                              # number of animation frames to generate
306 
307     local *GNUPLOT;                                                                         # output filehandle
308 
309     open( GNUPLOT, ">$PlotRotation" )                                                       # open the gnuplot rotation command file for write
310      or die FATAL, "Cannot create the command rotation file '$PlotRotation': $! -";
311     print GNUPLOT <<ROTATION;                                                               # write rotation commands:
312 frame_count = frame_count + 1
313 frame_title = sprintf( "Scatter Plot of Mean Normalized RGB Tile Values\\n( Rotation Angle = %i%c )", zview( zrot ), 176 )
314 set title frame_title offset 0,2,0 font "Arial,14"
315 set view xrot,zview( zrot )
316 replot
317 zrot = ( zrot + $PlotTwirl ) % 360
318 if( frame_count < $noFrames ) reread
319 ROTATION
320 
321     open( GNUPLOT, "|$Gnuplot" )                                                            # open pipe to the gnuplot executable
322      or die FATAL, "Cannot open the gnuplot program '$Gnuplot': $! -";
323     print GNUPLOT <<COMMANDS;                                                               # send plotting commands:
324 unset key
325 set hidden3d
326 set origin 0,-0.05
327 set border 4095 back lt 0 lc rgb "gray40"
328 set grid x y z back
329 set xyplane at 0
330 set xrange [0:1] noreverse nowriteback
331 set yrange [0:1] noreverse nowriteback
332 set zrange [0:1] noreverse nowriteback
333 set lmargin 8
334 set xlabel "Red" font "Arial,14" tc rgb "red"
335 set ylabel "Green" font "Arial,14" tc rgb "green"
336 set zlabel "Blue" offset -3,0,0 font "Arial,14" tc rgb "blue"
337 
338 xrot = 60
339 zrot = 50
340 zview( zrot ) = ( zrot%90 > 0 ) ? zrot : zrot + 2
341 
342 set terminal unknown
343 set datafile separator "\t"
344 set arrow from 0,0,0 to 1,1,1 nohead lt -1 lw 2 lc rgb "dark-gray"
345 splot "$TileData" using 3:4:5 with points lc rgb "steelblue"
346 
347 set terminal gif font "Arial" 8 size $PlotWidth,$PlotHeight animate delay $PlotDelay loop 0 nooptimize
348 set output "$PlotFile"
349 frame_count = 0
350 load "$PlotRotation"
351 unset output
352 reset
353 COMMANDS
354     close( GNUPLOT );                                                                       # end command stream
355     unlink $PlotRotation;                                                                   # delete the temporary file
356     if( defined( $backup ) ) {                                                              # delete backup if no change
357         unlink $backup unless compare( $PlotFile, $backup );
358     }
359 }                                                                                           #end sub animScatterPlot
360 #===== THREADED SUBROUTINES ====================================================================================================
361 #            Usage : &makeTiles()
362 #          Purpose : Retrieves file data from the image queue, creates the corresponding tile, computes its mean RGB values
363 #                    and adds the tile data to the tile queue. Also sets the "read-only" attribute of the image file to
364 #                    mark it as having been processed.
365 #        Arguments : None.
366 #          Returns : None.
367 #  Externals -  In : FATAL, $Geometry, $N, $TileDir
368 #  Externals - Out : None.
369 #     Shared -  In : $NoThreadsDone, $TileCode
370 #     Shared - Out : $NoThreadsDone, $TileCode
371 #           Queues : $ImageQueue, $TileQueue
372 #             Subs : report
373 #          Remarks : Dequeues $ImageQueue for the image filepath, attributes & corresponding image collection name.
374 #                    Enqueues $TileQueue with the tile's particulars as a TSV string.
375 #          History : v2.0.0 - October 1, 2010 - Original release.
376 
377 sub makeTiles {                                                                             #begin sub
378     my $threadId    = threads->tid();                                                       # get thread id
379     my $tile        = Image::Magick->new( magick => 'GIF' );                                # instantiate an image object for a tile
380     my $errorMsg;                                                                           # error message returned by PerlMagick
381     my $tileFile;                                                                           # tile filename
382     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
383     sub meanColor {                                                                         # begin sub to compute a mean normalized channel color
384         my @colors = $tile->GetPixels                                                       #  get channel intensities of all pixels:
385                                 (   map         => shift,                                   #   get color channel designation,
386                                     geometry    => $Geometry,                               #   specify tile geometry,
387                                     normalize   => 'true'                                   #   request normalized values
388                                 );
389         my $mean;                                                                           #  average value
390         $mean += $_ for @colors;                                                            #  sum all intensities
391         $mean /= scalar @colors;                                                            #  return the average value
392     }                                                                                       # end sub meanColor
393     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
394     while( my ( $imageFile, $attribs, $collection ) = $ImageQueue->dequeue( 3 ) ) {         # repeat; get image file path, attributes & collection name
395         last unless defined $imageFile;                                                     #  end repeat if end of queue encountered
396         {   lock $TileCode;                                                                 #  get lock on tile code
397             $tileFile = 'tile_' . $TileCode++ . '.gif';                                     #  compose tile filename & increment tile code
398         }                                                                                   #  release lock on tile code
399         &report( $threadId, 'bold green', "Processing $imageFile -> $tileFile" );           #  report processing status
400 
401         @$tile = ();                                                                        #  clear tile object
402         $tile->Read( $imageFile );                                                          #  init tile with source image file
403         $tile->Quantize( colorspace => 'RGB' );                                             #  insure RGB colorspace
404         $tile->Scale( geometry => "$Geometry!" );                                           #  scale tile image
405         die FATAL, "Cannot create tile for '$imageFile': $errorMsg\n -"                     #  abort if unable to save the tile to disk
406          if $errorMsg = $tile->Write( filename => "$TileDir/$tileFile" );
407         SetAttributes( $imageFile, $attribs | READONLY );                                   #  mark image as processed
408 
409         $TileQueue->enqueue( join "\t",                                                     #  enqueue tile data as a TSV string:
410                                 $tileFile,                                                  #   tile filename
411                                 $collection,                                                #   image collection name,
412                                 &meanColor( 'r' ),                                          #   mean normalized red color,
413                                 &meanColor( 'g' ),                                          #   mean normalized green color
414                                 &meanColor( 'b' )                                           #   mean normalized blue color
415                           );
416         &report( $threadId, 'bold yellow', 'Waiting...' );                                  #  report  waiting status
417     }                                                                                       # until all image data dequeued
418     {   lock $NoThreadsDone;                                                                # get lock on threads-done counter
419         $TileQueue->enqueue( undef )                                                        # signal updater thread that no more data forthcoming
420          if ++$NoThreadsDone == $N;                                                         #  if this is the last tile-creator thread to end
421     }                                                                                       # release lock on counter
422     &report( $threadId, 'bold red', 'Done' );                                               # report end of processing
423 }                                                                                           #end sub makeTiles
424 #-------------------------------------------------------------------------------------------------------------------------------
425 #            Usage : &updateTileData()
426 #          Purpose : Retrieves a new tile's data from the tile queue and writes it to the TSV data file.
427 #        Arguments : None.
428 #          Returns : None.
429 #  Externals -  In : FATAL, $TileData
430 #  Externals - Out : None.
431 #     Shared -  In : None.
432 #     Shared - Out : None.
433 #           Queues : $TileQueue
434 #             Subs : fileBackup, report
435 #          Remarks : Each tile record is a TSV string structured as follows:
436 #                    column 1: tile filename,
437 #                    column 2: image collection name,
438 #                    column 3: mean normalized red color,
439 #                    column 4: mean normalized green color and
440 #                    column 5: mean normalized blue color.
441 #          History : v2.0.0 - October 1, 2010 - Original release.
442 
443 sub updateTileData {                                                                        #begin sub
444     my $threadId    = threads->tid();                                                       # get thread id
445     my $backup      = &fileBackup( $TileData );                                             # tentatively backup any previous tile data
446 
447     local *DATA;                                                                            # output filehandle
448 
449     open( DATA, ">>$TileData" )                                                             # open the tile data file for append
450      or die FATAL, "Cannot open the tile data file '$TileData' for append: $! -";
451     while( my $data = $TileQueue->dequeue() ) {                                             # repeat: get tile data from queue
452         my $file = (split /\t/, $data, 2)[0];                                               #  extract tile filename
453         &report( $threadId, 'bold green', "Processing $file" );                             #  report processing status
454         print DATA $data, "\n";                                                             #  output TSV record
455         &report( $threadId, 'bold yellow', "$file created. Waiting..." );                   #  report waiting status
456     }                                                                                       # until end of queue encountered
457     close( DATA );                                                                          # close the tile data file
458 
459     if( defined( $backup ) ) {                                                              # delete backup if no change
460         unlink $backup unless compare( $TileData, $backup );
461     }
462     &report( $threadId, 'bold red', 'Done' );                                               # report end of processing
463 }                                                                                           #end sub updateTileData
464 #===== Copyright 2010, Webpraxis Consulting Ltd. - ALL RIGHTS RESERVED - Email: webpraxis@gmail.com ============================
465 # end of prep4PhotographicMosaic(threaded).pl
			

Code Details

Benchmark

To gauge the effectiveness of the script, a simple job was repeated consecutively up to twelve times on two different XP platforms for various numbers of threads. Each computer was started in its normal mode and not re-booted between each run. It is hoped that this approach yields a more realistic assessment. On both platforms, the screen saver was disabled and the only user-launched application was the command window with a screen width of 120 characters. Before each repetition, the same basic starting conditions were ensured in that the images had their read-only attribute cleared and the target tile directory was deleted. A single image set was the benchmark's test subject. It consisted of 26 files in two subfolders, ranging in size from 71 KiB to 5.33 MiB, with an average size of 741.5 KiB.
The two platforms used have different versions of XP (Professional SP3 versus Home SP2) and different installed software applications. Consequently, they are subject to different CPU contentions by their extraneous OS and background application threads. So, in order to account for these differences, the following table lists the reported throughputs with the highest and lowest values discarded in the statistical calculations:
Throughputs (KiB/sec) for 26 images / 18.82 MiB
Dell Dimension E520 [1] HP Pavillion a450n [2]
No. of Secondary Threads (N+1) No. of Secondary Threads (N+1)
1 + 1 2 + 1 3 + 1 1 + 1 2 + 1 3 + 1
Run #1 146.13 256.30 266.14 154.63 173.12 175.13
Run #2 146.59 268.11 249.46 159.16 177.91 175.03
Run #3 146.71 261.46 266.14 148.26 176.64 175.51
Run #4 146.60 259.43 247.36 158.79 173.70 172.61
Run #5 146.46 239.95 254.61 158.44 177.55 175.16
Run #6 146.32 261.18 250.32 159.08 176.41 174.42
Run #7 146.39 243.84 250.07 158.91 173.70 173.68
Run #8 146.43 258.94 248.91 159.43 170.68 176.03
Run #9 146.48 260.30 248.96 159.00 175.66 171.98
Run #10 146.53 256.94 251.65 159.16 173.65 172.80
Run #11 146.46 265.91 258.39 158.83 172.25 174.81
Run #12 146.45 265.11 259.92 159.24 177.17 172.54
Average 146.47 258.94 253.84 158.52 174.99 174.17
Std. Dev. 0.09 6.14 5.85 1.39 1.90 1.16
[1] Dell Dimension E520:
  • Intel® Pentium® D 805 (2.66 GHz dual core, 533 MHz FSB, 1 MiB L2 - no hyper-threading)
  • 1 GiB RAM
  • Seagate Technology 250 GiB Barracuda: SATA II, 7,200 RPM
  • MS Windows® XP Professional SP3 (supports Symmetric Multiprocessing (SMP))
  • Number of processors reported by the environment = 2
[2] HP Pavillion a450n:
  • Intel® Pentium® 4 (3.00 GHz single core, 800 MHz FSB, 1 MiB L2 - with hyper-threading)
  • 1 GiB RAM
  • Western Digital 160 GiB Caviar: EIDE, 7,200 RPM
  • MS Windows® XP Home SP2 (does not support SMP)
  • Number of processors reported by the environment = 2
In the "1+1" scenario (1 tile creator thread and 1 data-updater thread respectively), the CPU usage for both computers is between 50 and 60%. The hyper-threading, single core HP beats out the non hyper-threading, dual core Dell. Its better performance can be attributed to its higher clock and front-bus speeds along with the fact that the data-updater thread places little computational demands.
The results change dramatically in the "2+1" scenario. The CPU usage for both computers is now at 100%. The fluctuations in the throughputs are higher because contention by all threads has increased.This is especially the case for the Dell. However its throughput increases by roughly 77% on average versus the HP's 11%. From this point on in the benchmark, "real silicon" beats out "virtual silicon".
For completeness, a "3+1" scenario was tested. As expected, the performance of both platforms drops slightly. All threads are now waiting longer for their allocation of the execution cycles by the system scheduler. However, having an extra tile-creator thread executing does appear to attenuate the throughput fluctuations. Finally, when compared to the previous scenario, the results show that using the number of processors reported in the environment for the number of tile-creator threads is optimal on average for both computers.
The above simple benchmark clearly demonstrates the advantage of concurrent routines. Our current collection of 3,111 tiles was generated on the Dell using a "2+1" scenario. With a throughput of 260.08 KiB/sec, 5.47 GiB were processed in roughly 6.13 hrs. One can thus surmise that the HP would have taken around 9.1 hours to accomplish this same task. It is therefore well worth taking the time and effort to exploit the computational capabilities of a true multi-core environment.
Update: April 15, 2011

Having just finished building and testing a 3.8GHz(OC) hexacore system, I was of course itching to have it generate my tile collection. With Windows 7 (64-bit) as the OS, I installed ActivePerl v5.12.3.1204 (MSWin32- x64) along with ImageMagick v6.6.9-4-Q16 (Windows-x64). The resulting performance was breathtaking. With a throughput of 2.02 MiB/sec under the default "6+1" scenario, it only took 46.1 mins to accomplish the task. Sweet!

Such architecture and performance really opens up the door to devising new algorithms, like tile matching in a product metric space. But, I'm getting way ahead of myself and will thus have to leave this possibility until another day. It's hard to reign in one's imagination in view of the above result. And then you've got Intel having just announced their Xeon E7 chips with up to 10 hyperthreading cores enabling 20 simultaneous threads. May I live long enough to explore and exploit all these developments.


Our next article presents the script img2PhotographicMosaic(threaded).pl, a threaded image filter that will use the aforementioned tiles to create a photographic mosaic. In the meantime, if you have any questions regarding the code or my explanations, please do not hesitate in contacting me.