To pursue my exploration of Perl's threading implementation, I decided to turn my attention to a classic problem in numerical analysis, namely, estimating the solution of Poisson's equation in a rectangular region. In order to put the focus on the coding, Liebmann's method was used [Gerald, p.196] (also referred to as the Unextrapolated Liebmann or Gauss-Seidel method [Smith, p.147]). Though this algorithm is not recommended for production coding, it is by no means trivial. It offers ample possibilities in devising various parallel-processing variants.
The following discussion is based on Burden and Faires [pp. 568-569]. Poisson's equation is the elliptic partial-differential equation
The objective is to estimate the solution at all the interior mesh points with coordinates (xi , yj ), where, for specified integers n and m,
Liebmann's method then consists of using the Gauss-Seidel iterative scheme to solve the set of central-difference equations given by
In a purely sequential iteration scheme, the mesh points are systematically scanned from left to right along successive rows under the control of a pair of nested loops. In devising a thread-based algorithm, I considered a block-iterative formulation, where the domain R is partitioned into a number of blocks. For the purpose of this article, the formulation is constrained to vertical strips, each extending from the top to bottom. Each strip is then iteratively processed by a separate thread using the Gauss-Seidel method. The optimal number of partitions is thus related to the number of simultaneous threads that can be supported by a computer's architecture and OS.
Controlling the overall process efficiently in a threaded scheme requires a bit of finesse. Launching a thread under Perl entails a fair amount of overhead. Notwithstanding the internal set up, each thread inherits a copy of the calling program's data. And then there's the cleanup after each thread has finished executing. Thus having the calling program decide when to stop the iteration processes comes with a hefty price. For example, consider the following code snippet:
do {
++$NoIterations;
$threads[$_]=threads->create( \&estimateBlock, $Blocks[$_] ) for ( 0..$#Blocks );
$maxdev = $threads[0]->join();
$maxdev = &$Maximum( $maxdev, $threads[$_]->join() ) for ( 1..$#Blocks );
} while $maxdev > $Tol;
Each thread returns the maximum deviation in the estimates for its block after each iteration cycle. The calling program
in turn computes the overall deviation and checks it against the specified tolerance in the deviations. If not met, then
each thread must be relaunched until convergence is achieved. For our test problem, there's roughly a 4:1 ratio
in the time spent managing the threads versus carrying out the actual numerical computations! Needless to say, this is not a
compelling approach.
A much more efficient scheme is to launch the threads once and only once and let them decide when to stop. This is achieved by means of thread signaling. Whichever thread finishes its iteration cycle last must then also compute the overall maximum deviation. The threads that finished earlier simply wait for a signal from this last thread to resume. All of them then either perform another iteration cycle or stop depending on a boolean flag set by the last thread.
The Perl script liebmann_threaded.pl, displayed below, implements this approach. It is released for personal, non-commercial and non-profit use only. The listing includes the line numbers in order to reference them in the following general remarks.
perl liebmann_threaded.pl 3 1
which requests the use of three partitions (or, equivalently, 3 threads) and the results be displayed slowly. Setting the "SLOW" parameter allows one to witness the various threads iterating, waiting, resuming and stopping. For example, typical views for the above invocation would be
when all threads are iterating, with their spinner being updated after each new estimate has been calculated,
when Thread #2 goes into a wait state, having accomplished its iteration cycle first,
with Threads #0 and #2 standing by, Thread #1 is the last to finish so it must also compute the maximum
deviation and broadcast the resume signal,
after all the threads have halted.
One last important observation must be made. Running this code can result in the number of iterations required varying between 61 and 63. This is certainly the case on a single-core system. This may seem disconcerting at first given that a purely sequential algorithm never deviates in the number of iterations required in successive runs. Such an algorithm will always estimate the values in the same order, using the same mixture of old and new values. Not so with this threaded version. Due to process starvation, there's a certain amount of randomness in the computation order for the estimates on to the adjacent boundaries of contiguous blocks. In turn, this implies a random mixture of old and new estimate values coming into play at each iteration cycle. For example, let's say one thread is assigned to update the estimate w(i, j) whereas another one is responsible the adjacent one w(i+1, j). Recalling that each estimate is a function of the other, then it's never clear whether an estimate will use the other's old value or its new one. It all depends on which thread gets to do its computations first, regardless of any lock. In the worst case scenario, the iteration scheme for a thread could degenerate to a hybrid of the Jacobi and Gauss-Seidel methods where more old values are used instead of new ones.
As always, if you have any questions regarding the code or my explanations, please do not hesitate in contacting me.
001 use strict;
002 use Time::HiRes qw(gettimeofday tv_interval sleep);
003 my $Start_time = [gettimeofday];
004 use Term::ANSIScreen qw/:color :cursor :screen/;
005 use Win32::Console::ANSI qw/ Cursor /;
006 use threads;
007 use threads::shared;
008 use constant FATAL => "\aFATAL ERROR: ";
009 #===== Copyright 2009, Webpraxis Consulting Ltd. - ALL RIGHTS RESERVED - Email: webpraxis@gmail.com ============================
010 # liebmann_threaded.pl: Threaded solution of Poisson's equation on a rectangular region using Liebmann's method.
011 #===============================================================================================================================
012 # Usage : perl liebmann_threaded.pl NOBLOCKS SLOW
013 # Arguments : NOBLOCKS = number of domain partitions.
014 # SLOW = boolean flag for slowing down the iteration process
015 # Input Files : None.
016 # Output Files : None.
017 # Temporary Files : None.
018 # Remarks : See http://www.webpraxis.ab.ca/poisson/liebmann_threaded.shtml for details.
019 # History : v1.0.0 - January 20, 2009 - Original release.
020 #===============================================================================================================================
021 # 0) DEFINE THE PROBLEM PARTICULARS:
022 my $F = sub { my( $x, $y ) = @_; return $x * exp $y; }; #anonymous sub for rhs of Poisson equation
023 my $G = sub { my( $x, $y ) = @_; return $x * exp $y; }; #anonymous sub for boundary conditions
024 my $Xa = 0.; #domain boundary: lower endoint on the x-axis
025 my $Xb = 2.; #domain boundary: upper endoint on the x-axis
026 my $Yc = 0.; #domain boundary: lower endoint on the y-axis
027 my $Yd = 1.; #domain boundary: upper endoint on the y-axis
028 my $N = 6; #number of grid columns
029 my $M = 5; #number of grid rows
030 my $Tol = 1.e-10; #tolerance in successive estimates
031 my $MaxIterations = 100; #maximum number of iterations to attempt
032 #-------------------------------------------------------------------------------------------------------------------------------
033 # 1) INITIALIZE:
034 $| = 1; #set STDOUT buffer to auto-flush
035 cls(); #clear screen
036 print colored ['black on white'], "$0\n\n\n", color 'reset'; #display program name
037
038 my $NoBlocks = shift || die FATAL, 'Number of partitions not specified'; #get number of domain blocks
039 my $Slow = shift; #get flag for slowing down the iterations
040
041 my $H = ( $Xb - $Xa ) / $N; #compute mesh size along x-axis
042 my $K = ( $Yd - $Yc ) / $M; #compute mesh size along y-axis
043 my $H2 = $H * $H; #define computational constant
044 my $Lambda = $H2 / ( $K * $K ); #define computational constant
045 my $Mu = 2. * ( 1. + $Lambda ); #define computational constant
046
047 print " Problem Details:\n\n", #echo problem details
048 "\t Grid Size: N = $N, M = $M => ", ($N-1) * ($M-1), " interior mesh points\n",
049 "\t No of partitions: $NoBlocks\n",
050 "\t Xa: $Xa\n",
051 "\t Xb: $Xb\n",
052 "\t H: $H\n",
053 "\t Yc: $Yc\n",
054 "\t Yd: $Yd\n",
055 "\t K: $K\n",
056 "\t Tolerance: $Tol\n",
057 "\tMax no of iterations: $MaxIterations\n",
058 "\t Slow Motion: ", ($Slow) ? 'Yes' : 'No', "\n\n",
059 '-' x 80, "\n\n";
060 die FATAL, "N must exceed the number of partitions" unless $N > $NoBlocks; #check for minimum grid size
061 die FATAL, "M must exceed 1" unless $M > 1;
062 #-------------------------------------------------------------------------------------------------------------------------------
063 # 2) DEFINE THE INTERIOR MESH POINTS, COMPUTE BOUNDARY VALUES & INITIALIZE SOLUTION ESTIMATE:
064 my @X :shared; #array for the interior mesh point x-values
065 my @Y :shared; #array for the interior mesh point y-values
066 my @W :shared; #array for the boundary values & solution estimates
067 $W[$_] = &share([]) for ( 0..$N ); #extend as array of arrays by adding shared leaf nodes
068
069 $X[$_] = $Xa + $_ * $H for ( 1..$N - 1 ); #compute interior mesh point x-values
070 $Y[$_] = $Yc + $_ * $K for ( 1..$M - 1 ); #compute interior mesh point y-values
071
072 for ( 1..$N - 1 ) { #repeat for each interior horizontal node value
073 $W[$_][$M] = &$G( $X[$_], $Yd ); # compute top boundary condition
074 $W[$_][ 0] = &$G( $X[$_], $Yc ); # compute bottom boundary condition
075 } #until all values processed
076
077 for ( 1..$M - 1 ) { #repeat for each interior vertical node value
078 $W[ 0][$_] = &$G( $Xa, $Y[$_] ); # compute left boundary condition
079 $W[$N][$_] = &$G( $Xb, $Y[$_] ); # compute right boundary condition
080 } #until all values processed
081
082 for my $i ( 1..$N - 1 ) { #repeat for each interior mesh point
083 $W[$i][$_] = 0. for ( 1..$M - 1 ); # init all estimates
084 } #until all interior mesh points processed
085 #-------------------------------------------------------------------------------------------------------------------------------
086 # 3) PARTITION DOMAIN INTO VERTICAL STRIPS; TRY TO BALANCE THE IMPENDING WORK LOAD:
087 my @Blocks; #array of hashes for the block node boundaries
088 { #begin naked block
089 my $blockIdx = 0; # block index
090 my @noVLines = ( int( ($N - 1) / $NoBlocks ) ) x $NoBlocks; # init number of vertical grid lines per block
091
092 PARTITIONS: { # begin domain partitioning
093 $Blocks[0 ] = { TOP => $M - 1, BOTTOM => 1, LEFT => 1, RIGHT => $noVLines[0] };
094 $Blocks[$_] = { TOP => $M - 1, BOTTOM => 1, LEFT => $Blocks[$_-1]{RIGHT} + 1, RIGHT => $Blocks[$_-1]{RIGHT} + $noVLines[$_] }
095 for( 1..$NoBlocks - 1 );
096
097 ++$noVLines[$blockIdx++], redo PARTITIONS # increase width of one block at the time
098 if $Blocks[$#Blocks]{RIGHT} < $N - 1; # until all interior nodes allocated
099 } # end domain partitioning
100 } #end naked block
101 #-------------------------------------------------------------------------------------------------------------------------------
102 # 4) ITERATE THE BLOCKS:
103 my $CursorY; #cursor row coordinate
104 my @Spinners = ( '-', '\\', '|', '/' ); #define symbols for spinner
105 my @threads; #array of threads
106 #variables shared between calling program and threads:
107 my $MaxDev :shared; # maximum deviation in successive estimates
108 my $NoIterations :shared; # counter for number of iterations used
109 #variables shared between threads only:
110 my $Halt :shared; # boolean flag for halting the threads
111 my $NoBlocksDone :shared; # counter for number of partitions processed
112
113 print " Process Details:\n\n"; #inform user of processing start
114 $CursorY = ( Cursor() )[1]; #record cursor row position
115
116 $threads[$_] = threads->create( \&estimateBlock, $_ ) for ( 0..$#Blocks ); #launch a thread for each partition
117 $threads[$_]->join() for ( 0..$#Blocks ); #wait for threads to return
118
119 print locate( ++$CursorY + $NoBlocks, 1 ), '-' x 80, "\n\n";
120 die FATAL, "Iteration limit reached (MaxDev = $MaxDev).\n" if $MaxDev > $Tol; #halt if iteration limit reached
121 #-------------------------------------------------------------------------------------------------------------------------------
122 # 5) OUTPUT THE ESTIMATES, EXACT VALUES AND ERROR NORMS:
123 print " $NoIterations Iterations Required:\n\n"; #report no of iterations
124 printf "\t%3s %3s %1s %1s %1s %1s %7s\n", #output column headings
125 'i','j','x','y','w','u','|u - w|';
126 print "\t", '-' x 55, "\n";
127 for my $i ( 1..$N - 1 ) { #repeat for each interior horizontal node value
128 for my $j ( 1..$M - 1 ) { # repeat for each interior vertical node value
129 my $u = &$F( $X[$i],$Y[$j] ); # compute exact solution
130 my $e = abs( $u - $W[$i][$j] ); # compute error norm
131 printf "\t%3s %3s %7.5f %7.5f %7.5f %7.5f %4.2e\n", # output values
132 $i,$j,$X[$i],$Y[$j],$W[$i][$j],$u,$e;
133 } # until all vertical values processed
134 } #until all horizontal values processed
135 print "\n\tElapsed time: ", tv_interval( $Start_time ), "\n"; #report total elapsed time
136 exit; #end processing
137 #===== SUBROUTINES =============================================================================================================
138 # Usage : &estimateBlock( %BLOCKIDX );
139 # Purpose : Estimate the solution for a specified partition using Liebmann's method.
140 # Arguments : %BLOCKIDX = partition index value
141 # Externals : $CursorY, $H2, $Lambda, $Mu, $NoBlocks, $Slow, @Blocks, @Spinners
142 # Shared Externals : $Halt, $MaxDev, $NoBlocksDone, $NoIterations, @W, @X, @Y
143 # Subs : $F
144 # Remarks : None.
145 # History : v1.0.0 - January 20, 2009 - Original release.
146
147 sub estimateBlock { #begin sub
148 my $blockIdx = shift; # parameterize the argument
149 my %block = %{ $Blocks[ $blockIdx ] }; # parameterize the block boundaries
150 my $maximum = sub { my( $x, $y ) = @_; return ($x < $y ) ? $y : $x; }; # anonymous sub for computing the maximum of 2 numbers
151 my $spinnerIdx = 0; # init spinner index
152 my $spinner; # spinner
153
154 do { # repeat
155 my $maxBlockDev = 0; # init maximum deviation for the block
156 for my $i ( $block{LEFT}..$block{RIGHT} ) { # repeat for each block horizontal node value
157 for my $j ( $block{BOTTOM}..$block{TOP} ) { # repeat for each block vertical node value
158 $maxBlockDev = &$maximum( $maxBlockDev, &estimateNode( $i, $j ) ); # estimate solution at node & update maximum deviation
159 $spinnerIdx = ++$spinnerIdx % 4; # update spinner index
160 $spinner = "(" . $Spinners[$spinnerIdx] . ")"; # update processing spinner
161 print locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ",
162 colored ['bold green'], "Iterating " , $spinner;
163 sleep( 0.2 ) if $Slow; # sleep if slowdown requested
164 } # until all vertical values processed
165 } # until all horizontal values processed
166
167 lock( $NoBlocksDone ); # lock counter for no of blocks processed
168 ++$NoBlocksDone; # increment counter
169 $MaxDev = &$maximum( $MaxDev, $maxBlockDev ); # update maximum deviation across all blocks
170 if( $NoBlocksDone == $NoBlocks ) { # if this was the last block to be processed
171 ++$NoIterations; # update iteration counter
172 print locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ",
173 colored ['bold yellow'], "Checking after $NoIterations iteration(s); MaxDev = $MaxDev";
174 sleep( 1.4 ) if $Slow; # sleep if slowdown requested
175 $Halt = ($MaxDev <= $Tol) || ($NoIterations == $MaxIterations); # update stopping criterion
176 $NoBlocksDone = $MaxDev = 0 unless $Halt; # reset control variables if won't be stopping
177 cond_broadcast( $NoBlocksDone ); # signal other threads to resume
178 } else { # else
179 print locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ",
180 colored ['bold red'], "Waiting...";
181 cond_wait( $NoBlocksDone ); # wait for remaining threads to finish estimating
182 } # end if-else
183 } until $Halt; # until stopping criterion in effect
184 print locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ",
185 colored ['bold red'], "Stopped";
186
187 # Usage : &estimateNode( I, J );
188 # Purpose : Estimates the solution at a specified grid node. Returns the deviation.
189 # Arguments : I = horizontal node value
190 # J = vertical node value
191 # Externals : $H2, $Lambda, $Mu
192 # Shared Externals : @W, @X, @Y
193 # Subs : $F
194 # History : v1.0.0 - January 19, 2009 - Original release.
195
196 sub estimateNode { # begin sub
197 my $i = shift; # parameterize horizontal node value
198 my $j = shift; # parameterize vertical node value
199
200 my $oldW = $W[$i][$j]; # save previous estimate at current node
201 $W[$i][$j] = ( - $H2 * &$F( $X[$i],$Y[$j] ) + $W[$i+1][$j] + $W[$i-1][$j] # compute new estimate
202 + $Lambda * ( $W[$i][$j+1] + $W[$i][$j-1] )
203 ) / $Mu;
204 return abs( $W[$i][$j] - $oldW ); # return the deviation
205 } # end sub estimateNode
206 } #end sub estimateBlock
207 #===== Copyright 2009, Webpraxis Consulting Ltd. - ALL RIGHTS RESERVED - Email: webpraxis@gmail.com ============================
208 # end of liebmann_threaded.pl
© 2012 Webpraxis Consulting Ltd. – ALL RIGHTS RESERVED.