Abstract: In large 3-D finite element optical tomography problems, computation times for forward and adjoint solutions and for calculation of sensitivities can become prohibitive. Parallelization of computer codes can be used to obtain speedups approaching the number of processors employed, but parallel codes and computer systems can be difficult and expensive to develop and maintain. We show that by employing highly vectorized code that takes advantage of pipelining capabilities in the microprocessor we achieve dramatic speedups for forward and adjoint sensitivity calculations on a single processor microcomputer, and that these speedups actually increase as the problem size increases. Our vectorized implementations involve replication of large amounts of data and are thus memory intensive, however we effectively remove memory constraints by using domain decomposition to control the use of virtual memory. We show that global matrix assembly for a large (98,304 element) mesh is speeded up by a factor of 6.5 and adjoint sensitivity calculations of emission fluence with respect to fluorescence absorption are speeded up by a factor of 502 on a single-processor 2.2 GHz Pentium IV.
[edit database entry]