Webpraxis Consulting Ltd.: liebmann

References

liebmann_threaded.pl

Threaded solution of Poisson's equation on a rectangular region using Liebmann's method.

To pursue my exploration of Perl's threading implementation, I decided to turn my attention to a classic problem in numerical analysis, namely, estimating the solution of Poisson's equation in a rectangular region. In order to put the focus on the coding, Liebmann's method was used [Gerald, p.196] (also referred to as the Unextrapolated Liebmann or Gauss-Seidel method [Smith, p.147]). Though this algorithm is not recommended for production coding, it is by no means trivial. It offers ample possibilities in devising various parallel-processing variants.

The following discussion is based on Burden and Faires [pp. 568-569]. Poisson's equation is the elliptic partial-differential equation

The objective is to estimate the solution at all the interior mesh points with coordinates (x_i , y_j ), where, for specified integers n and m,

Liebmann's method then consists of using the Gauss-Seidel iterative scheme to solve the set of central-difference equations given by

In a purely sequential iteration scheme, the mesh points are systematically scanned from left to right along successive rows under the control of a pair of nested loops. In devising a thread-based algorithm, I considered a block-iterative formulation, where the domain R is partitioned into a number of blocks. For the purpose of this article, the formulation is constrained to vertical strips, each extending from the top to bottom. Each strip is then iteratively processed by a separate thread using the Gauss-Seidel method. The optimal number of partitions is thus related to the number of simultaneous threads that can be supported by a computer's architecture and OS.

Controlling the overall process efficiently in a threaded scheme requires a bit of finesse. Launching a thread under Perl entails a fair amount of overhead. Notwithstanding the internal set up, each thread inherits a copy of the calling program's data. And then there's the cleanup after each thread has finished executing. Thus having the calling program decide when to stop the iteration processes comes with a hefty price. For example, consider the following code snippet:

            do {
                ++$NoIterations;
                $threads[$_]=threads->create( \&estimateBlock, $Blocks[$_] ) for ( 0..$#Blocks );
                $maxdev = $threads[0]->join();
                $maxdev = &$Maximum( $maxdev, $threads[$_]->join() ) for ( 1..$#Blocks );
            } while $maxdev > $Tol;

Each thread returns the maximum deviation in the estimates for its block after each iteration cycle. The calling program in turn computes the overall deviation and checks it against the specified tolerance in the deviations. If not met, then each thread must be relaunched until convergence is achieved. For our test problem, there's roughly a 4:1 ratio in the time spent managing the threads versus carrying out the actual numerical computations! Needless to say, this is not a compelling approach.

A much more efficient scheme is to launch the threads once and only once and let them decide when to stop. This is achieved by means of thread signaling. Whichever thread finishes its iteration cycle last must then also compute the overall maximum deviation. The threads that finished earlier simply wait for a signal from this last thread to resume. All of them then either perform another iteration cycle or stop depending on a boolean flag set by the last thread.

The Perl script liebmann_threaded.pl, displayed below, implements this approach. It is released for personal, non-commercial and non-profit use only. The listing includes the line numbers in order to reference them in the following general remarks.

Lines 001-007: Declaration of the required imports. Note the use of the extra modules Time::HiRes, Term::ANSIScreen and Win32::Console::ANSI. Their use is simply to enhance the program's output.
Lines 012-014: Description of the required command line parameters. An example is

perl liebmann_threaded.pl 3 1
which requests the use of three partitions (or, equivalently, 3 threads) and the results be displayed slowly. Setting the "SLOW" parameter allows one to witness the various threads iterating, waiting, resuming and stopping. For example, typical views for the above invocation would be

when all threads are iterating, with their spinner being updated after each new estimate has been calculated,

when Thread #2 goes into a wait state, having accomplished its iteration cycle first,

with Threads #0 and #2 standing by, Thread #1 is the last to finish so it must also compute the maximum deviation and broadcast the resume signal,

after all the threads have halted.
If set to zero or omitted, then this part of the output will simply be a blur.
Lines 022-031: Defines the particular Poisson equation to be solved, taken from Burden and Faires [pp. 575-576]:

The constant n is set to 6 and m to 5. The tolerance in the maximum deviation between any two successive estimates is 10^-10 as measured by their absolute difference. Finally the maximum number of iterations to attempt is declared arbitrarily to be 100. The purpose of such a variable is to halt the process due to divergence or inaccurate coding of the problem parameters.
Lines 034-061: The program initialization sequence. The first line sets the output buffer to auto-flush in order to ensure that information is printed before any premature program termination. After reading the program arguments, the step sizes are computed along with some computational constants. Finally the results of the set up are echoed.
Lines 063-084: The coordinates of the interior mesh points are defined. Their values are stored in shared arrays just to minimize memory consumption. Next, the shared array of arrays W is initialized with the boundary values and with an initial solution guess of zero for the interior values. This array will be read and written to by all the threads.
Lines 086-100: This section partitions the domain into the number of blocks specified on the command line. The blocks reference the i and j nodes coordinates. Each block is initially assigned the number of vertical grid lines given by the integer part of (n-1)/(# of blocks). If some grid lines remain unassigned, then each block will in turn acquire one more grid line until all are assigned. The aim of this type of redistribution is to equalize, as much as possible, the impending number of calculations by each thread. The actual assignment is carried out in a cascade fashion. Such a scheme will be repeated at most "(n-1) modulo (# of blocks)" times. So, for 3 blocks and with n=6, the main sequence of events would be:
1. Assign 1 grid line to each block so that the left and right block i-node boundaries are
  $Blocks[0]{LEFT} = 1 and $Blocks[0]{RIGHT} = 1,
  $Blocks[1]{LEFT} = 2 and $Blocks[0]{RIGHT} = 2,
  $Blocks[2]{LEFT} = 3 and $Blocks[0]{RIGHT} = 3.
2. That leaves 2 grid lines unassigned. So increase the width of the zeroth block by 1 and repeat the assignment to yield
  $Blocks[0]{LEFT} = 1 and $Blocks[0]{RIGHT} = 2,
  $Blocks[1]{LEFT} = 3 and $Blocks[0]{RIGHT} = 3,
  $Blocks[2]{LEFT} = 4 and $Blocks[0]{RIGHT} = 4.
3. Only 1 grid is now unassigned. So increase the width of the first block by 1 and repeat the assignment to yield
  $Blocks[0]{LEFT} = 1 and $Blocks[0]{RIGHT} = 2,
  $Blocks[1]{LEFT} = 3 and $Blocks[0]{RIGHT} = 4,
  $Blocks[2]{LEFT} = 5 and $Blocks[0]{RIGHT} = 5.
Thus the zeroth and first blocks cover 2 grid lines whilst the second one covers the last one at i=5.
Lines 102-120: With all the preliminary steps done, this section launches the various threads, one for each partition block. The shared variables $MaxDev and $NoIterations are read and written to by the threads but only read by the calling program when reporting the results. The other two shared variables, $Halt and $NoBlocksDone are only read and written by the threads and are key to their ability to self-manage the iteration process.
Lines 122-136: This section reports the results of the iteration if convergence has occurred. It also outputs the difference between the estimates and the exact solution xe^y.
Lines 138-206: This is the main subroutine associated with each thread call.
Lines 154-183: These lines are the outermost loop, which will repeat until either the maximum deviation in the estimates is less than the specified tolerance or the number of maximum number of iterations allowed is reached.
Lines 156-165: These two nested loops define the required node coordinates in order to perform a single cycle of the Gauss-Seidel method for the associated partition block. Each new estimate is computed by Subroutine estimateNode and the current maximum deviation for the block is updated.
Lines 167-182: This is the key section that allows the threads to self-manage the iteration process. It starts with the critical step of getting a lock on the shared variable $NoBlocksDone. Once obtained, a thread can then update part or all of the remaining shared variables. Then the counter $NoBlocksDone is incremented to reflect the number of threads that have finished their iteration cycle. The overall maximum deviation is next updated. If the current thread is not the last one to finish, then it releases the lock on $NoBlocksDone and goes into a wait state. On the other hand, if it is the last to finish, it first increments the iteration counter and then determines whether all the threads should repeat their calculations or stop. The value of the stopping criterion is stored in the shared boolean variable $Halt. Finally, the last thread then broadcasts a signal to the other threads to resume. They then all consult the value of $Halt in order to take the appropriate action.
Lines 187-205: This secondary subroutine is responsible for computing the estimate at a specified node coordinate in accordance with the corresponding central-difference equation. It returns the absolute difference between the new and old values of the estimate. Note that we've chosen to recompute f(x,y) with each call rather than pre-calculate the values and store them in an array, i.e., $F[$i,$j], in order to minimize memory usage (though for such a small grid the point is moot). Moreover, no lock is placed on the array W in order not to impede the execution speed.

One last important observation must be made. Running this code can result in the number of iterations required varying between 61 and 63. This is certainly the case on a single-core system. This may seem disconcerting at first given that a purely sequential algorithm never deviates in the number of iterations required in successive runs. Such an algorithm will always estimate the values in the same order, using the same mixture of old and new values. Not so with this threaded version. Due to process starvation, there's a certain amount of randomness in the computation order for the estimates on to the adjacent boundaries of contiguous blocks. In turn, this implies a random mixture of old and new estimate values coming into play at each iteration cycle. For example, let's say one thread is assigned to update the estimate w(i, j) whereas another one is responsible the adjacent one w(i+1, j). Recalling that each estimate is a function of the other, then it's never clear whether an estimate will use the other's old value or its new one. It all depends on which thread gets to do its computations first, regardless of any lock. In the worst case scenario, the iteration scheme for a thread could degenerate to a hybrid of the Jacobi and Gauss-Seidel methods where more old values are used instead of new ones.

As always, if you have any questions regarding the code or my explanations, please do not hesitate in contacting me.

liebmann_threaded.pl -- (Download latest version - MD5 checksum )

001 use strict;
002 use Time::HiRes qw(gettimeofday tv_interval sleep);
003 my $Start_time = [gettimeofday];
004 use Term::ANSIScreen qw/:color :cursor :screen/;
005 use Win32::Console::ANSI qw/ Cursor /;
006 use threads;
007 use threads::shared;
008 use constant FATAL => "\aFATAL ERROR: ";
009 #===== Copyright 2009, Webpraxis Consulting Ltd. - ALL RIGHTS RESERVED - Email: webpraxis@gmail.com ============================
010 # liebmann_threaded.pl: Threaded solution of Poisson's equation on a rectangular region using Liebmann's method.
011 #===============================================================================================================================
012 #           Usage : perl liebmann_threaded.pl NOBLOCKS SLOW
013 #       Arguments : NOBLOCKS = number of domain partitions.
014 #                   SLOW     = boolean flag for slowing down the iteration process
015 #     Input Files : None.
016 #    Output Files : None.
017 # Temporary Files : None.
018 #         Remarks : See http://www.webpraxis.ab.ca/poisson/liebmann_threaded.shtml for details.
019 #         History : v1.0.0 - January 20, 2009 - Original release.
020 #===============================================================================================================================
021 # 0) DEFINE THE PROBLEM PARTICULARS:
022 my $F               = sub { my( $x, $y ) = @_; return $x * exp $y; };                   #anonymous sub for rhs of Poisson equation
023 my $G               = sub { my( $x, $y ) = @_; return $x * exp $y; };                   #anonymous sub for boundary conditions
024 my $Xa              = 0.;                                                               #domain boundary: lower endoint on the x-axis
025 my $Xb              = 2.;                                                               #domain boundary: upper endoint on the x-axis
026 my $Yc              = 0.;                                                               #domain boundary: lower endoint on the y-axis
027 my $Yd              = 1.;                                                               #domain boundary: upper endoint on the y-axis
028 my $N               = 6;                                                                #number of grid columns
029 my $M               = 5;                                                                #number of grid rows
030 my $Tol             = 1.e-10;                                                           #tolerance in successive estimates
031 my $MaxIterations   = 100;                                                              #maximum number of iterations to attempt
032 #-------------------------------------------------------------------------------------------------------------------------------
033 # 1) INITIALIZE:
034 $| = 1;                                                                                 #set STDOUT buffer to auto-flush
035 cls();                                                                                  #clear screen
036 print colored ['black on white'], "$0\n\n\n", color 'reset';                            #display program name
037
038 my $NoBlocks    = shift || die FATAL, 'Number of partitions not specified';             #get number of domain blocks
039 my $Slow        = shift;                                                                #get flag for slowing down the iterations
040
041 my $H           = ( $Xb - $Xa ) / $N;                                                   #compute mesh size along x-axis
042 my $K           = ( $Yd - $Yc ) / $M;                                                   #compute mesh size along y-axis
043 my $H2          = $H * $H;                                                              #define computational constant
044 my $Lambda      = $H2 / ( $K * $K );                                                    #define computational constant
045 my $Mu          = 2. * ( 1. + $Lambda );                                                #define computational constant
046
047 print " Problem Details:\n\n",                                                          #echo problem details
048       "\t           Grid Size: N = $N, M = $M => ", ($N-1) * ($M-1), " interior mesh points\n",
049       "\t    No of partitions: $NoBlocks\n",
050       "\t                  Xa: $Xa\n",
051       "\t                  Xb: $Xb\n",
052       "\t                   H: $H\n",
053       "\t                  Yc: $Yc\n",
054       "\t                  Yd: $Yd\n",
055       "\t                   K: $K\n",
056       "\t           Tolerance: $Tol\n",
057       "\tMax no of iterations: $MaxIterations\n",
058       "\t         Slow Motion: ", ($Slow) ? 'Yes' : 'No', "\n\n",
059       '-' x 80, "\n\n";
060 die FATAL, "N must exceed the number of partitions" unless $N > $NoBlocks;              #check for minimum grid size
061 die FATAL, "M must exceed 1" unless $M > 1;
062 #-------------------------------------------------------------------------------------------------------------------------------
063 # 2) DEFINE THE INTERIOR MESH POINTS, COMPUTE BOUNDARY VALUES & INITIALIZE SOLUTION ESTIMATE:
064 my @X :shared;                                                                          #array for the interior mesh point x-values
065 my @Y :shared;                                                                          #array for the interior mesh point y-values
066 my @W :shared;                                                                          #array for the boundary values & solution estimates
067    $W[$_] = &share([]) for ( 0..$N );                                                   #extend as array of arrays by adding shared leaf nodes
068
069 $X[$_] = $Xa + $_ * $H for ( 1..$N - 1 );                                               #compute interior mesh point x-values
070 $Y[$_] = $Yc + $_ * $K for ( 1..$M - 1 );                                               #compute interior mesh point y-values
071
072 for ( 1..$N - 1 ) {                                                                     #repeat for each interior horizontal node value
073     $W[$_][$M] = &$G( $X[$_], $Yd );                                                    # compute top boundary condition
074     $W[$_][ 0] = &$G( $X[$_], $Yc );                                                    # compute bottom boundary condition
075 }                                                                                       #until all values processed
076
077 for ( 1..$M - 1 ) {                                                                     #repeat for each interior vertical node value
078     $W[ 0][$_] = &$G( $Xa, $Y[$_] );                                                    # compute left boundary condition
079     $W[$N][$_] = &$G( $Xb, $Y[$_] );                                                    # compute right boundary condition
080 }                                                                                       #until all values processed
081
082 for my $i ( 1..$N - 1 ) {                                                               #repeat for each interior mesh point
083     $W[$i][$_] = 0. for ( 1..$M - 1 );                                                  # init all estimates
084 }                                                                                       #until all interior mesh points processed
085 #-------------------------------------------------------------------------------------------------------------------------------
086 # 3) PARTITION DOMAIN INTO VERTICAL STRIPS; TRY TO BALANCE THE IMPENDING WORK LOAD:
087 my @Blocks;                                                                             #array of hashes for the block node boundaries
088 {                                                                                       #begin naked block
089     my $blockIdx    = 0;                                                                # block index
090     my @noVLines    = ( int( ($N - 1) / $NoBlocks ) ) x $NoBlocks;                      # init number of vertical grid lines per block
091
092     PARTITIONS: {                                                                       # begin domain partitioning
093         $Blocks[0 ] = { TOP => $M - 1, BOTTOM => 1, LEFT => 1, RIGHT => $noVLines[0] };
094         $Blocks[$_] = { TOP => $M - 1, BOTTOM => 1, LEFT => $Blocks[$_-1]{RIGHT} + 1, RIGHT => $Blocks[$_-1]{RIGHT} + $noVLines[$_] }
095          for( 1..$NoBlocks - 1 );
096
097         ++$noVLines[$blockIdx++], redo PARTITIONS                                       #  increase width of one block at the time
098          if $Blocks[$#Blocks]{RIGHT} < $N - 1;                                          #   until all interior nodes allocated
099     }                                                                                   # end domain partitioning
100 }                                                                                       #end naked block
101 #-------------------------------------------------------------------------------------------------------------------------------
102 # 4) ITERATE THE BLOCKS:
103 my $CursorY;                                                                            #cursor row coordinate
104 my @Spinners        = ( '-', '\\', '|', '/' );                                          #define symbols for spinner
105 my @threads;                                                                            #array of threads
106                                                                                         #variables shared between calling program and threads:
107 my $MaxDev          :shared;                                                            # maximum deviation in successive estimates
108 my $NoIterations    :shared;                                                            # counter for number of iterations used
109                                                                                         #variables shared between threads only:
110 my $Halt            :shared;                                                            # boolean flag for halting the threads
111 my $NoBlocksDone    :shared;                                                            # counter for number of partitions processed
112
113 print " Process Details:\n\n";                                                          #inform user of processing start
114 $CursorY = ( Cursor() )[1];                                                             #record cursor row position
115
116 $threads[$_] = threads->create( \&estimateBlock, $_ ) for ( 0..$#Blocks );              #launch a thread for each partition
117 $threads[$_]->join() for ( 0..$#Blocks );                                               #wait for threads to return
118
119 print locate( ++$CursorY + $NoBlocks, 1 ), '-' x 80, "\n\n";
120 die FATAL, "Iteration limit reached (MaxDev = $MaxDev).\n" if $MaxDev > $Tol;           #halt if iteration limit reached
121 #-------------------------------------------------------------------------------------------------------------------------------
122 # 5) OUTPUT THE ESTIMATES, EXACT VALUES AND ERROR NORMS:
123 print   " $NoIterations Iterations Required:\n\n";                                      #report no of iterations
124 printf  "\t%3s %3s      %1s        %1s        %1s        %1s      %7s\n",               #output column headings
125         'i','j','x','y','w','u','|u - w|';
126 print   "\t", '-' x 55, "\n";
127 for my $i ( 1..$N - 1 ) {                                                               #repeat for each interior horizontal node value
128     for my $j ( 1..$M - 1 ) {                                                           # repeat for each interior vertical node value
129         my $u = &$F( $X[$i],$Y[$j] );                                                   #  compute exact solution
130         my $e = abs( $u - $W[$i][$j] );                                                 #  compute error norm
131         printf  "\t%3s %3s   %7.5f  %7.5f  %7.5f  %7.5f  %4.2e\n",                      #  output values
132                 $i,$j,$X[$i],$Y[$j],$W[$i][$j],$u,$e;
133     }                                                                                   # until all vertical values processed
134 }                                                                                       #until all horizontal values processed
135 print "\n\tElapsed time: ", tv_interval( $Start_time ), "\n";                           #report total elapsed time
136 exit;                                                                                   #end processing
137 #===== SUBROUTINES =============================================================================================================
138 #            Usage : &estimateBlock( %BLOCKIDX );
139 #          Purpose : Estimate the solution for a specified partition using Liebmann's method.
140 #        Arguments : %BLOCKIDX = partition index value
141 #        Externals : $CursorY, $H2, $Lambda, $Mu, $NoBlocks, $Slow, @Blocks, @Spinners
142 # Shared Externals : $Halt, $MaxDev, $NoBlocksDone, $NoIterations, @W, @X, @Y
143 #             Subs : $F
144 #          Remarks : None.
145 #          History : v1.0.0 - January 20, 2009 - Original release.
146
147 sub estimateBlock {                                                                     #begin sub
148     my $blockIdx    = shift;                                                            # parameterize the argument
149     my %block       = %{ $Blocks[ $blockIdx ] };                                        # parameterize the block boundaries
150     my $maximum     = sub { my( $x, $y ) = @_; return ($x < $y ) ? $y : $x; };          # anonymous sub for computing the maximum of 2 numbers
151     my $spinnerIdx  = 0;                                                                # init spinner index
152     my $spinner;                                                                        # spinner
153
154     do {                                                                                # repeat
155         my $maxBlockDev = 0;                                                            #  init maximum deviation for the block
156         for my $i ( $block{LEFT}..$block{RIGHT} ) {                                     #  repeat for each block horizontal node value
157             for my $j ( $block{BOTTOM}..$block{TOP} ) {                                 #   repeat for each block vertical node value
158                 $maxBlockDev    = &$maximum( $maxBlockDev, &estimateNode( $i, $j ) );   #    estimate solution at node & update maximum deviation
159                 $spinnerIdx     = ++$spinnerIdx % 4;                                    #    update spinner index
160                 $spinner        = "(" . $Spinners[$spinnerIdx] . ")";                   #    update processing spinner
161                 print   locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ",
162                         colored ['bold green'], "Iterating " , $spinner;
163                 sleep( 0.2 ) if $Slow;                                                  #    sleep if slowdown requested
164             }                                                                           #   until all vertical values processed
165         }                                                                               #  until all horizontal values processed
166
167         lock( $NoBlocksDone );                                                          #  lock counter for no of blocks processed
168         ++$NoBlocksDone;                                                                #  increment counter
169         $MaxDev = &$maximum( $MaxDev, $maxBlockDev );                                   #  update maximum deviation across all blocks
170         if( $NoBlocksDone == $NoBlocks ) {                                              #  if this was the last block to be processed
171             ++$NoIterations;                                                            #   update iteration counter
172             print   locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ",
173                     colored ['bold yellow'], "Checking after $NoIterations iteration(s); MaxDev = $MaxDev";
174             sleep( 1.4 ) if $Slow;                                                      #   sleep if slowdown requested
175             $Halt           = ($MaxDev <= $Tol) || ($NoIterations == $MaxIterations);   #   update stopping criterion
176             $NoBlocksDone   = $MaxDev = 0 unless $Halt;                                 #   reset control variables if won't be stopping
177             cond_broadcast( $NoBlocksDone );                                            #   signal other threads to resume
178         } else {                                                                        #  else
179             print   locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ",
180                     colored ['bold red'], "Waiting...";
181             cond_wait( $NoBlocksDone );                                                 #   wait for remaining threads to finish estimating
182         }                                                                               #  end if-else
183     } until $Halt;                                                                      # until stopping criterion in effect
184     print   locate( $CursorY + $blockIdx, 3 ), clline, "Partition #$blockIdx: ",
185             colored ['bold red'], "Stopped";
186
187     #            Usage : &estimateNode( I, J );
188     #          Purpose : Estimates the solution at a specified grid node. Returns the deviation.
189     #        Arguments : I = horizontal node value
190     #                    J = vertical node value
191     #        Externals : $H2, $Lambda, $Mu
192     # Shared Externals : @W, @X, @Y
193     #             Subs : $F
194     #          History : v1.0.0 - January 19, 2009 - Original release.
195
196     sub estimateNode {                                                                  # begin sub
197         my $i = shift;                                                                  #  parameterize horizontal node value
198         my $j = shift;                                                                  #  parameterize vertical node value
199
200         my $oldW    = $W[$i][$j];                                                       #  save previous estimate at current node
201         $W[$i][$j]  = ( - $H2 * &$F( $X[$i],$Y[$j] ) + $W[$i+1][$j] + $W[$i-1][$j]      #  compute new estimate
202                             + $Lambda * ( $W[$i][$j+1] + $W[$i][$j-1] )
203                       ) / $Mu;
204         return abs( $W[$i][$j] - $oldW );                                               #  return the deviation
205     }                                                                                   # end sub estimateNode
206 }                                                                                       #end sub estimateBlock
207 #===== Copyright 2009, Webpraxis Consulting Ltd. - ALL RIGHTS RESERVED - Email: webpraxis@gmail.com ============================
208 # end of liebmann_threaded.pl

Software Corner

Burden, R.L. and Faires, J.D. (1985): Numerical Analysis, Third Edition, Prindle, Weber & Schmidt, Boston, ISBN 0-87150-857-5.
Gerald, C.F. (1970): Applied Numerical Analysis, Addison-Wesley Publishing Company, Inc., Reading, Massachusetts.
Smith, G.D. (1969): Numerical Solution of Partial Differential Equations, Oxford University Press, London.