To pursue my exploration of Perl's threading implementation, I decided to turn my attention to a classic problem in numerical analysis, namely, estimating the solution of Poisson's equation in a rectangular region. In order to put the focus on the coding, Liebmann's method was used [Gerald, p.196] (also referred to as the Unextrapolated Liebmann or Gauss-Seidel method [Smith, p.147]). Though this algorithm is not recommended for production coding, it is by no means trivial. It offers ample possibilities in devising various parallel-processing variants.
The following discussion is based on Burden and Faires [pp. 568-569]. Poisson's equation is the elliptic partial-differential equation
The objective is to estimate the solution at all the interior mesh points with coordinates (xi , yj ), where, for specified integers n and m,
Liebmann's method then consists of using the Gauss-Seidel iterative scheme to solve the set of central-difference equations given by
In a purely sequential iteration scheme, the mesh points are systematically scanned from left to right along successive rows under the control of a pair of nested loops. In devising a thread-based algorithm, I considered a block-iterative formulation, where the domain R is partitioned into a number of blocks. For the purpose of this article, the formulation is constrained to vertical strips, each extending from the top to bottom. Each strip is then iteratively processed by a separate thread using the Gauss-Seidel method. The optimal number of partitions is thus related to the number of simultaneous threads that can be supported by a computer's architecture and OS.
Controlling the overall process efficiently in a threaded scheme requires a bit of finesse. Launching a thread under Perl entails a fair amount of overhead. Notwithstanding the internal set up, each thread inherits a copy of the calling program's data. And then there's the cleanup after each thread has finished executing. Thus having the calling program decide when to stop the iteration processes comes with a hefty price. For example, consider the following code snippet:
do { ++$NoIterations; $threads[$_]=threads->create( \&estimateBlock, $Blocks[$_] ) for ( 0..$#Blocks ); $maxdev = $threads[0]->join(); $maxdev = &$Maximum( $maxdev, $threads[$_]->join() ) for ( 1..$#Blocks ); } while $maxdev > $Tol;Each thread returns the maximum deviation in the estimates for its block after each iteration cycle. The calling program in turn computes the overall deviation and checks it against the specified tolerance in the deviations. If not met, then each thread must be relaunched until convergence is achieved. For our test problem, there's roughly a 4:1 ratio in the time spent managing the threads versus carrying out the actual numerical computations! Needless to say, this is not a compelling approach.
A much more efficient scheme is to launch the threads once and only once and let them decide when to stop. This is achieved by means of thread signaling. Whichever thread finishes its iteration cycle last must then also compute the overall maximum deviation. The threads that finished earlier simply wait for a signal from this last thread to resume. All of them then either perform another iteration cycle or stop depending on a boolean flag set by the last thread.
The Perl script liebmann_threaded.pl, displayed below, implements this approach. It is released for personal, non-commercial and non-profit use only. The listing includes the line numbers in order to reference them in the following general remarks.
perl liebmann_threaded.pl 3 1
which requests the use of three partitions (or, equivalently, 3 threads) and the results be displayed slowly. Setting the "SLOW" parameter allows one to witness the various threads iterating, waiting, resuming and stopping. For example, typical views for the above invocation would beOne last important observation must be made. Running this code can result in the number of iterations required varying between 61 and 63. This is certainly the case on a single-core system. This may seem disconcerting at first given that a purely sequential algorithm never deviates in the number of iterations required in successive runs. Such an algorithm will always estimate the values in the same order, using the same mixture of old and new values. Not so with this threaded version. Due to process starvation, there's a certain amount of randomness in the computation order for the estimates on to the adjacent boundaries of contiguous blocks. In turn, this implies a random mixture of old and new estimate values coming into play at each iteration cycle. For example, let's say one thread is assigned to update the estimate w(i, j) whereas another one is responsible the adjacent one w(i+1, j). Recalling that each estimate is a function of the other, then it's never clear whether an estimate will use the other's old value or its new one. It all depends on which thread gets to do its computations first, regardless of any lock. In the worst case scenario, the iteration scheme for a thread could degenerate to a hybrid of the Jacobi and Gauss-Seidel methods where more old values are used instead of new ones.
As always, if you have any questions regarding the code or my explanations, please do not hesitate in contacting me.
001 use strict; 002 use Time::HiRes qw(gettimeofday tv_interval sleep); 003 my $Start_time = [gettimeofday]; 004 use Term::ANSIScreen qw/:color :cursor :screen/; 005 use Win32::Console::ANSI qw/ Cursor /; 006 use threads; 007 use threads::shared; 008 use constant FATAL => "\aFATAL ERROR: "; 009 #===== Copyright 2009, Webpraxis Consulting Ltd. - ALL RIGHTS RESERVED - Email: webpraxis@gmail.com ============================ 010 # liebmann_threaded.pl: Threaded solution of Poisson's equation on a rectangular region using Liebmann's method. 011 #=============================================================================================================================== 012 # Usage : perl liebmann_threaded.pl NOBLOCKS SLOW 013 # Arguments : NOBLOCKS = number of domain partitions. 014 # SLOW = boolean flag for slowing down the iteration process 015 # Input Files : None. 016 # Output Files : None. 017 # Temporary Files : None. 018 # Remarks : See http://www.webpraxis.ab.ca/poisson/liebmann_threaded.shtml for details. 019 # History : v1.0.0 - January 20, 2009 - Original release. 020 #=============================================================================================================================== 021 # 0) DEFINE THE PROBLEM PARTICULARS: 022 my $F = sub { my( $x, $y ) = @_; return $x * exp $y; }; #anonymous sub for rhs of Poisson equation 023 my $G = sub { my( $x, $y ) = @_; return $x * exp $y; }; #anonymous sub for boundary conditions 024 my $Xa = 0.; #domain boundary: lower endoint on the x-axis 025 my $Xb = 2.; #domain boundary: upper endoint on the x-axis 026 my $Yc = 0.; #domain boundary: lower endoint on the y-axis 027 my $Yd = 1.; #domain boundary: upper endoint on the y-axis 028 my $N = 6; #number of grid columns 029 my $M = 5; #number of grid rows 030 my $Tol = 1.e-10; #tolerance in successive estimates 031 my $MaxIterations = 100; #maximum number of iterations to attempt 032 #------------------------------------------------------------------------------------------------------------------------------- 033 # 1) INITIALIZE: 034 $| = 1; #set STDOUT buffer to auto-flush 035 cls(); #clear screen 036 print colored ['black on white'], "$0\n\n\n", color 'reset'; #display program name 037 038 my $NoBlocks = shift || die FATAL, 'Number of partitions not specified'; #get number of domain blocks 039 my $Slow = shift; #get flag for slowing down the iterations 040 041 my $H = ( $Xb - $Xa ) / $N; #compute mesh size along x-axis 042 my $K = ( $Yd - $Yc ) / $M; #compute mesh size along y-axis 043 my $H2 = $H * $H; #define computational constant 044 my $Lambda = $H2 / ( $K * $K ); #define computational constant 045 my $Mu = 2. * ( 1. + $Lambda ); #define computational constant 046 047 print " Problem Details:\n\n", #echo problem details 048 "\t Grid Size: N = $N, M = $M => ", ($N-1) * ($M-1), " interior mesh points\n", 049 "\t No of partitions: $NoBlocks\n", 050 "\t Xa: $Xa\n", 051 "\t Xb: $Xb\n", 052 "\t H: $H\n", 053 "\t Yc: $Yc\n", 054 "\t Yd: $Yd\n", 055 "\t K: $K\n", 056 "\t Tolerance: $Tol\n", 057 "\tMax no of iterations: $MaxIterations\n", 058 "\t Slow Motion: ", ($Slow) ? 'Yes' : 'No', "\n\n", 059 '-' x 80, "\n\n"; 060 die FATAL, "N must exceed the number of partitions" unless $N > $NoBlocks; #check for minimum grid size 061 die FATAL, "M must exceed 1" unless $M > 1; 062 #------------------------------------------------------------------------------------------------------------------------------- 063 # 2) DEFINE THE INTERIOR MESH POINTS, COMPUTE BOUNDARY VALUES & INITIALIZE SOLUTION ESTIMATE: 064 my @X :shared; #array for the interior mesh point x-values 065 my @Y :shared; #array for the interior mesh point y-values 066 my @W :shared; #array for the boundary values & solution estimates 067 $W[$_] = &share([]) for ( 0..$N ); #extend as array of arrays by adding shared leaf nodes 068 069 $X[$_] = $Xa + $_ * $H for ( 1..$N - 1 ); #compute interior mesh point x-values 070 $Y[$_] = $Yc + $_ * $K for ( 1..$M - 1 ); #compute interior mesh point y-values 071 072 for ( 1..$N - 1 ) { #repeat for each interior horizontal node value 073 $W[$_][$M] = &$G( $X[$_], $Yd ); # compute top boundary condition 074 $W[$_][ 0] = &$G( $X[$_], $Yc ); # compute bottom boundary condition 075 } #until all values processed 076 077 for ( 1..$M - 1 ) { #repeat for each interior vertical node value 078 $W[ 0][$_] = &$G( $Xa, $Y[$_] ); # compute left boundary condition 079 $W[$N][$_] = &$G( $Xb, $Y[$_] ); # compute right boundary condition 080 } #until all values processed 081 082 for my $i ( 1..$N - 1 ) { #repeat for each interior mesh point 083 $W[$i][$_] = 0. for ( 1..$M - 1 ); # init all estimates 084 } #until all interior mesh points processed 085 #------------------------------------------------------------------------------------------------------------------------------- 086 # 3) PARTITION DOMAIN INTO VERTICAL STRIPS; TRY TO BALANCE THE IMPENDING WORK LOAD: 087 my @Blocks; #array of hashes for the block node boundaries 088 { #begin naked block 089 my $blockIdx = 0; # block index 090 my @noVLines = ( int( ($N - 1) / $NoBlocks ) ) x $NoBlocks; # init number of vertical grid lines per block 091 092 PARTITIONS: { # begin domain partitioning 093 $Blocks[0 ] = { TOP => $M - 1, BOTTOM => 1, LEFT => 1, RIGHT => $noVLines[0] }; 094 $Blocks[$_] = { TOP => $M - 1, BOTTOM => 1, LEFT => $Blocks[$_-1]{RIGHT} + 1, RIGHT => $Blocks[$_-1]{RIGHT} + $noVLines[$_] } 095 for( 1..$NoBlocks - 1 ); 096 097 ++$noVLines[$blockIdx++], redo PARTITIONS # increase width of one block at the time 098 if $Blocks[$#Blocks]{RIGHT} < $N - 1; # until all interior nodes allocated 099 } # end domain partitioning 100 } #end naked block 101 #------------------------------------------------------------------------------------------------------------------------------- 102 # 4) ITERATE THE BLOCKS: 103 my $CursorY; #cursor row coordinate 104 my @Spinners = ( '-', '\\', '|', '/' ); #define symbols for spinner 105 my @threads; #array of threads 106 #variables shared between calling program and threads: 107 my $MaxDev :shared; # maximum deviation in successive estimates 108 my $NoIterations :shared; # counter for number of iterations used 109 #variables shared between threads only: 110 my $Halt :shared; # boolean flag for halting the threads 111 my $NoBlocksDone :shared; # counter for number of partitions processed 112 113 print " Process Details:\n\n"; #inform user of processing start 114 $CursorY = ( Cursor() )[1]; #record cursor row position 115 116 $threads[$_] = threads->create( \&estimateBlock, $_ ) for ( 0..$#Blocks ); #launch a thread for each partition 117 $threads[$_]->join() for ( 0..$#Blocks ); #wait for threads to return 118 119 print locate( ++$CursorY + $NoBlocks, 1 ), '-' x 80, "\n\n"; 120 die FATAL, "Iteration limit reached (MaxDev = $MaxDev).\n" if $MaxDev > $Tol; #halt if iteration limit reached 121 #------------------------------------------------------------------------------------------------------------------------------- 122 # 5) OUTPUT THE ESTIMATES, EXACT VALUES AND ERROR NORMS: 123 print " $NoIterations Iterations Required:\n\n"; #report no of iterations 124 printf "\t%3s %3s %1s %1s %1s %1s %7s\n", #output column headings 125 'i','j','x','y','w','u','|u - w|'; 126 print "\t", '-' x 55, "\n"; 127 for my $i ( 1..$N - 1 ) { #repeat for each interior horizontal node value 128 for my $j ( 1..$M - 1 ) { # repeat for each interior vertical node value 129 my $u = &$F( $X[$i],$Y[$j] ); # compute exact solution 130 my $e = abs( $u - $W[$i][$j] ); # compute error norm 131 printf "\t%3s %3s %7.5f %7.5f %7.5f %7.5f %4.2e\n", # output values 132 $i,$j,$X[$i],$Y[$j],$W[$i][$j],$u,$e; 133 } # until all vertical values processed 134 } #until all horizontal values processed 135 print "\n\tElapsed time: ", tv_interval( $Start_time ), "\n"; #report total elapsed time 136 exit; #end processing 137 #===== SUBROUTINES ============================================================================================================= 138 # Usage : &estimateBlock( %BLOCKIDX ); 139 # Purpose : Estimate the solution for a specified partition using Liebmann's method. 140 # Arguments : %BLOCKIDX = partition index value 141 # Externals : $CursorY, $H2, $Lambda, $Mu, $NoBlocks, $Slow, @Blocks, @Spinners 142 # Shared Externals : $Halt, $MaxDev, $NoBlocksDone, $NoIterations, @W, @X, @Y 143 # Subs : $F 144 # Remarks : None. 145 # History : v1.0.0 - January 20, 2009 - Original release. 146 147 sub estimateBlock { #begin sub 148 my $blockIdx = shift; # parameterize the argument 149 my %block = %{ $Blocks[ $blockIdx ] }; # parameterize the block boundaries 150 my $maximum = sub { my( $x, $y ) = @_; return ($x < $y ) ? $y : $x; }; # anonymous sub for computing the maximum of 2 numbers 151 my $spinnerIdx = 0; # init spinner index 152 my $spinner; # spinner 153 154 do { # repeat 155 my $maxBlockDev = 0; # init maximum deviation for the block 156 for my $i ( $block{LEFT}..$block{RIGHT} ) { # repeat for each block horizontal node value 157 for my $j ( $block{BOTTOM}..$block{TOP} ) { # repeat for each block vertical node value 158 $maxBlockDev = &$maximum( $maxBlockDev, &estimateNode( $i, $j ) ); # estimate solution at node & update maximum deviation 159 $spinnerIdx = ++$spinnerIdx % 4; # update spinner index 160 $spinner = "(" . $Spinners[$spinnerIdx] . ")"; # update processing spinner 161 print locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ", 162 colored ['bold green'], "Iterating " , $spinner; 163 sleep( 0.2 ) if $Slow; # sleep if slowdown requested 164 } # until all vertical values processed 165 } # until all horizontal values processed 166 167 lock( $NoBlocksDone ); # lock counter for no of blocks processed 168 ++$NoBlocksDone; # increment counter 169 $MaxDev = &$maximum( $MaxDev, $maxBlockDev ); # update maximum deviation across all blocks 170 if( $NoBlocksDone == $NoBlocks ) { # if this was the last block to be processed 171 ++$NoIterations; # update iteration counter 172 print locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ", 173 colored ['bold yellow'], "Checking after $NoIterations iteration(s); MaxDev = $MaxDev"; 174 sleep( 1.4 ) if $Slow; # sleep if slowdown requested 175 $Halt = ($MaxDev <= $Tol) || ($NoIterations == $MaxIterations); # update stopping criterion 176 $NoBlocksDone = $MaxDev = 0 unless $Halt; # reset control variables if won't be stopping 177 cond_broadcast( $NoBlocksDone ); # signal other threads to resume 178 } else { # else 179 print locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ", 180 colored ['bold red'], "Waiting..."; 181 cond_wait( $NoBlocksDone ); # wait for remaining threads to finish estimating 182 } # end if-else 183 } until $Halt; # until stopping criterion in effect 184 print locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ", 185 colored ['bold red'], "Stopped"; 186 187 # Usage : &estimateNode( I, J ); 188 # Purpose : Estimates the solution at a specified grid node. Returns the deviation. 189 # Arguments : I = horizontal node value 190 # J = vertical node value 191 # Externals : $H2, $Lambda, $Mu 192 # Shared Externals : @W, @X, @Y 193 # Subs : $F 194 # History : v1.0.0 - January 19, 2009 - Original release. 195 196 sub estimateNode { # begin sub 197 my $i = shift; # parameterize horizontal node value 198 my $j = shift; # parameterize vertical node value 199 200 my $oldW = $W[$i][$j]; # save previous estimate at current node 201 $W[$i][$j] = ( - $H2 * &$F( $X[$i],$Y[$j] ) + $W[$i+1][$j] + $W[$i-1][$j] # compute new estimate 202 + $Lambda * ( $W[$i][$j+1] + $W[$i][$j-1] ) 203 ) / $Mu; 204 return abs( $W[$i][$j] - $oldW ); # return the deviation 205 } # end sub estimateNode 206 } #end sub estimateBlock 207 #===== Copyright 2009, Webpraxis Consulting Ltd. - ALL RIGHTS RESERVED - Email: webpraxis@gmail.com ============================ 208 # end of liebmann_threaded.pl
© 2024 Webpraxis Consulting Ltd. – ALL RIGHTS RESERVED.