Modern XPS
instruments are capable of generating copious quantities of data. When faced
with such overwhelming information, the desire is strong to reduce the data to
a small number of characteristic values from which true knowledge of a sample can be claimed. The success of such an
approach relies heavily on the stability of the data reduction steps with
respect to small but irrelevant changes in the data yet also sufficiently
sensitive to reveal subtle but important differences held within. For XPS
spectra, the traditional tools for reducing large number of data channels to
meaningful characteristic values are peak integration and non-linear least
squares optimisation using synthetic components. Both tools, to a greater or
lesser degree, rely on the experience of the analyst performing the experiment
and while quantification based on regions is general stable with respect to
changes in the data, optimised synthetic components require more care, but
offers chemical state information often lost within relatively coarse
integration regions. Given extensive data sets coupled with these relatively
manually-intensive data reduction options, a movement towards a statistical
approach to characterising such data is a natural development. Principal
Component Analysis (PCA) and related techniques are mathematical procedures for
manipulating a data set into a form containing a reduced number of
characteristic quantities. The usefulness of these techniques, as with all
statistical analyses, depends on an analyst’s ability to interpret the results
in the context of the sample. Interpreting the quantities generated by PCA is
strongly coupled to an understanding of the underlying theory and an
appreciation of the influence of irrelevant data fluctuations on the
results.
This
chapter is concerned with explaining the theory behind PCA and attempting to
place this theory in the context of XPS data. The discussion will begin with
spectral data, however all statements are equally applicable to image data and
ultimately, image analysis via PCA combined with the traditional XPS data
reduction tools is the real success of this theory when applied through CasaXPS.
An XPS
spectrum is a set of acquisition bins, where energy separated bins accumulate
counts for a given period of time. The usual interpretation of these data are
discrete approximations to a continuous function plotted using counts per
second against binding energy. However, an alternative view is to interpret the
counts in each energy bin as a coordinate value for one dimension in a
multi-dimensional vector space. Therefore each spectrum may be seen as a vector
in an m-dimensional space, where m is the number of energy bins. In isolation,
the vector interpretation of the data is far less natural than the continuous
function perspective, however if a spectrum is merely one of many such spectra,
for example a spectrum included as part of a depth profile experiment, then the
vector description becomes more advantageous for the following reason. While an
experiment may contain numerous spectra, not all spectra are entirely
independent of one another. From a practical perspective, an analysis of a
depth profile using a synthetic peak model consisting of a small number of
Gaussian/Lorentzian line-shapes is only possible if these line-shapes, when
combined appropriately, offer an adequate description for each spectrum in the
profile sequence. The use of synthetic models for this purpose is evidence that
the set of spectra from a depth profile, say, must include much redundancy. It
is precisely the objective of multivariate statistics to isolate common trends
within a large sample of data and provide quantitative values characteristic of
these trends. These statistical techniques are essentially based on numerical
algorithms for manipulating sets of co-ordinates and therefore the vector
interpretation becomes more appropriate for large sets of spectra.
The
co-ordinate system in Figure 1 is an extension of the two and three dimensional
Cartesian co-ordinate system. Indeed the Cartesian co-ordinate system is
natural in the sense that the unit vectors in the direction of the mutually-orthogonal
co-ordinate axes are easily written down in terms of the digits one and zero. While
natural from a mathematical viewpoint, the objective of mathematics is to
provide a general description, often at the expense of the specific and while
the Cartesian system works well to simplify general mathematical descriptions,
the specifics of a data set can be hidden by the use of Cartesian co-ordinates.
That is to say, the Cartesian co-ordinate axes are positioned without regard to
the data being represented. The fact is that any m-dimensional space
described using the Cartesian unit vectors in Figure 1 could equally well be
described by a set of m mutually-orthogonal vector from
the same vector space (Figure 2). The trick is, therefore, to choose a basis
set such that the basis vectors uj describe the shapes
within the set of spectra, very much as the synthetic line-shapes might do when
curve fitting a profile, and allow the coefficients cji (Figure 2)
to provide quantities characteristic of the surface.
Consider
the following example. Figure 3
illustrates a relatively small data set of artificial data created from
idealised Gaussian/Lorentzian peak positions, FWHM and relative intensities
based on PMMA and PVC C 1s spectra. The pure PMMA and pure PVC data envelopes
are the top and bottom spectra displayed in the active tile in CasaXPS. Between
these two spectra is a sequence of curves generated by linear interpolation
using the pure shapes, so the set of curves gradually blend from PMMA to PVC.
The spectra represent six vectors in a 322-dimension vector space, where the numerical
value 322 is the number of acquisition bins for each spectrum. Although the
number of data bins per spectrum is 322, it is easy to realise that six vectors
can describe at most six dimensions and therefore the Cartesian representation
uses many more unit vectors than are required to describe the data in Figure 3.
Simply from the description for the construction of the data set, it is clear
the two spectra for pure PMMA and pure PVC contain all the shape information
within the sequence and therefore the set of six vectors must all lie in a
plane. Further, the best choice for the two basis vectors for this plane would
be precisely the two pure spectra normalised. In this case, all the
quantitative information would lie in just two co-ordinate values with respect
to these two unit vectors. What is more, a linear least squares procedure using
the PMMA and PVC spectra as target vectors would fully characterise the
variations designed into the data set. The key point here is that the set of
six vectors displayed as spectra in Figure 3 only need two vectors and six sets
of co-ordinates to characterise the entire data set.
The problem
of characterising a data set by inspection without prior knowledge is a little
more difficult. The procedure fundamental to determining dominant shapes within
a data set is related to linear-least-squares, where the mechanism by which a
linear least squares analysis using the pure PMMA and PVC spectra can also be
used to determine a basis for the vector space spanned by the six spectra in
Figure 3. Consider the linear-least squares problem: given two basis vectors
corresponding to the PMMA and PVC spectra, the linear-least-squares procedure
constructs a matrix A of size 322 x 2, where the columns of the matrix A
are the two vectors for PMMA and PVC. Following through the analysis of linear
least squares results in the normal equations ATAa = ATb,
where a is
the least-square solution for b, an
unknown spectral vector. It can be shown that solving these normal equations is
equivalent to performing a Singular Valued Decomposition of the matrix A
into three matrices A = UWVT, where W is a diagonal matrix constructed
from the eigenvalues of ATA and V
is the matrix of eigenvectors corresponding to the eigenvalues in W.
The matrix U is the same size as A and corresponds to the
eigenvectors of AAT. The least squares solution is computed using
the terms in the Singular Valued Decomposition by effectively projecting b, the unknown, onto the columns of U weighted by the reciprocal of
the diagonal elements from W
and using these quantities to form a linear combination of the vectors from V.
In summary, the least-squares solution for an unknown spectrum in terms of
known spectra is completely determined by the SVD matrices. The eigenvectors
for ATA
form an orthogonal basis for the least squares solution space while the columns
of the U matrix form an orthogonal basis set for the same subspace
spanned by the columns of A. What is more, the use of the
matrix ATA in the algorithm guides the choice of basis
towards the variations in the data.
By analogy,
considering each of the six spectra in Figure 3 as a possible basis vector for
a linear least squares decomposition of an unknown spectrum. An SVD for the
matrix of size 322 x 6, consisting of the six vectors, can be constructed in
the image of the matrices shown in Figure 4. The result is a similar
prescription for determining the set of least-squares vectors within the
subspace spanned by the original six vectors. In other words, the SVD offers up
the new basis for the vector space spanned by the original spectra. Further,
because the matrices U and V are both orthogonal,
their action on any vector preserves the distance metric, therefore, the W
matrix contains all the information regarding the relative contributions of
these eigenvectors to the description of the subspace. If a diagonal element of
W
is too small then this simply means
the corresponding eigenvectors should be excluded from any subsequent
calculations. The implication of a small eigenvalue is that the original vectors
include a linear dependency and therefore the dimension of the subspace was
less than the number of spectra used in the A matrix. What is more,
the diagonal elements of W rank the importance of the basis
vectors appearing as the column vectors in U, hence the commonly used name for
the SVD result, namely, Principal Component Analysis. That is, the principal
components are those column vectors in U for which the corresponding
eigenvalue are considered to be significant.
An SVD
performed on the six spectra in Figure 3 is presented in Figure 5. The traces
in the active tile (Figure 5) are the unit vectors from the U
matrix ranked by magnitude of the corresponding eigenvalue. The tables
appearing in the dialog windows in Figure 5 list the eigenvalues and related
statistics, plus the co-ordinates of the original spectra with respect to the
unit vectors from the U matrix. These results are entirely
consistent with the interpretation of the data set given above, namely, only
two eigenvalues are non-zero and therefore only two linearly independent basis
vectors were found. The outcome of the PCA is therefore that the entire set of
six spectra is completely described by two vectors and six pairs of
co-ordinates. The Cartesian description (Figure 1) required 322 unit vectors
and 322 co-ordinates per vector. The PCA has therefore performed a significant
data reduction. However the outstanding question of how to interpret the reduced
number of characteristic values still remains.
The unit
vectors displayed in Figure 5 are referred to as abstract factors (AF). These
abstract factors encapsulate the shape information from the set of six spectra,
but have been chosen in such a way that the two significant abstract factors
account for the maximum contribution to the entire set of spectra subject to
the constraint that the abstract factors are mutually orthogonal. The
co-ordinates are often referred to as the loadings
for the data with respect to the abstract factors and, for this example, do follow
a trend from pure PMMA to pure PVC; however the numerical values relate to the
proportions of the abstract factors rather than a direct connection with the
pure spectra. The abstract factors are the appropriate linear combination of
the PMMA and PVC spectra so that if the data set, as a whole, is approximated
by the abstract factors weighted using the corresponding loading, the root mean
square error monotonically decreases with the addition of each new abstract
factor ordered by eigenvalue. Thus, the reduced set of characteristic values does
not clearly identify the chemical information but rather indicates a transition
between two states has occurred.
It is clear
from the above discussion that PCA is a mathematical tool offering a data
oriented perspective for a large data set. The relationship to the
linear-least-squares procedure provides some comfort to those wishing to
dissect an experiment; however the results of PCA are not always as easy to
interpret as might be thought at first sight. The following are a list of
cautionary notes, where the intention is to expose these pitfalls rather than
condemn PCA as a technique. Indeed, in the right context, PCA is a powerful
tool and for certain types of multi-dimensional data set, applying the theory
of PCA as part of an analysis yields remarkable results.
The first
note relates to curve-fitting and the ability of PCA to assist when
constructing a peak model. Once again, consider the data set in Figure 3. It is
worth emphasizing the fact that the set of six spectra represent a movement
between two data envelopes, however even though each envelope is in turn
constructed from individual Gaussian/Lorentzian peaks, the PCA analysis has no
way of identifying the underlying peaks, since the variation in the data set
was achieved by varying these underlying peaks as a unit. PCA therefore
indicates the number of synthetic models required to describe the changes in
the data set, not the number of synthetic peaks required to construct the
synthetic models. As a result PCA makes a rather weak statement regarding the
number of synthetic components in a data set, namely, PCA at best only offers
the minimum number of components required and will only deliver the exact
number provide all the individual transitions present in a data set vary
independently at some point during the course of the experiment. At worst, PCA
is inconclusive regarding the true number of trends and is entirely misleading if
used to attempt inference of the number of synthetic components in a peak
model.
The data
set shown in Figure 6 is constructed to illustrate the sources for confusion
when interpreting PCA results. The same method for construction of the data
shown in Figure 3 is employed for the first column of VAMAS blocks shown selected
in the data set in Figure 6. In the case
of the data in Figure 6, three synthetic polymer data envelopes are blended
using linear interpolation, namely, PMMA, PVC and PIB. Three additional columns
of VAMAS blocks are similar blends of PVC to modified PVC data, where the
modified PVC is either energy shifted, intensity scaled or convoluted with a
Gaussian to simulate broadening of a spectral feature. All three modifications
to an initial line-shape might be expected throughout the course of a
multi-layer sputter depth profile, for example. A PCA applied to the data in
the first column in Figure 6 is displayed in Figure 7, where the graphical
display plots the loadings for each of the fifteen spectra used in the PCA with
respect to the three most significant abstract factors. The axes in the 3-D
plot represent the first three abstract factors, which as expected, account for
all the shape information in the data set. The abstract factors are also
plotted in Figure 7 and the coordinates for each of the original spectra used
in the PCA are listed.
The first
observation based on the data in Figure 7 is the importance of both parts of
the PCA to the description of the data set. That is, the loadings clearly show
an orderly pattern for the spectra when projected onto the first three abstract
factors, but the meaning of this pattern is only discerned, if at all, by
examining the abstract factors. Now
consider a PCA performed using the same data as that used in Figure 7 but with
the addition of the set of PVC spectra in the third column of VAMAS blocks in
Figure 6. These new additions to the PCA calculation are merely the PVC
spectrum scaled by various constants. Clearly, since PVC is already present in
the original data set, scaling the PVC spectrum and including these data will
not add any new shape information to the data set. Thus performing a PCA on the
new set of spectra results in the information displayed in Figure 8, where the
PCA result dialog window indicated there are still only three significant
(non-zero) eigenvalues, as expected. The addition of these scaled PVC spectra
to the PCA calculation has, however, significantly altered the abstract
factors. Comparing the factors between Figure 7 and Figure 8 it is clear the
loadings are very different and if the loadings are taken in isolation, the inference
would be a dramatic difference had occurred between the two data sets. An
initial response might be such differences are good and offer an indicator of
changes between samples. However consider a depth profile through an oxide
layer in silicon. If the same material is profiled, but using a different
number of etch-cycles, then the two data sets would be very similar to the
situation in this example. Extra etch-cycles through the metallic silicon layer
would bias the data set towards the metallic signal and therefore the two
experiments might easily generate two different profiles based on the loadings
alone. The important thing to note is that the PCA is characterising changes in
not only the sample, but also the experimental parameters used to perform the
measurement. Changing the acquisition parameters can introduce major changes in
the abstract factors and therefore the loadings.
Two further
sequences of PVC spectra are included in Figure 6. The consequence of a peak
shifting in energy is an additional significant abstract factor. Similar, a
peak broadening also introduces a new abstract factor so when these two sets of
essentially PVC spectra are added to the data from the first column, a PCA for
these data results in five significant non-zero eigenvalues and therefore
abstract factors (Figure 9). At this point it is worth noting that the
synthetic data used in these tests does not include noise, hence the
eigenvalues, while small, are still very distinct from the eigenvalues
genuinely equal to zero. The point of introducing these modifications to the
PVC spectrum is that PCA, when confronted with commonly found variations in XPS
spectra, indicates an increased number of trends in the data set compared to
those that might be considered significant with respect to the sample. Baring
in mind XPS includes other adjustments to peak shapes resulting from background
variations, the prospect of assigning the number of trends within a data set based
on the number of significant abstract factors becomes daunting.
Real XPS
differs from the spectra in Figure 6 in that Poisson statistics cause
variations in the signal which can mask underlying trends otherwise identified
by PCA. The six abstract factors in the left-hand-side of Figure 9 are
normalised vectors and while the least significant abstract factor may appear
to contain noise, in fact the rapid oscillations in the unit vector are due to
numerical round-off errors. The corresponding eigenvalue for the sixth abstract
factor is twelve orders of magnitude smaller than the largest eigenvalue and
since the eigenvalues contain the size information from a PCA, the sixth
abstract factor makes no contribution to the description of the original data set.
The same data as seen in Figure 6, for which with simulated noise consistent
with Poisson statistical variations is added, results in the abstract factors
and eigenvalues in the right-hand-side of Figure 9. The sixth abstract factor
for the noisy data set is as significant in terms of size as the fifth, which
itself resembles noise and therefore has lost much of the shape information
clearly present in the noise-free PCA abstract factors. The point to take from
this example is that information in the data set clearly distinguishable in the
absence of noise is dispersed throughout the abstract factors for data with
noise content and therefore identifying the number of underlying trends is far
less straight forward for real data sets.
The
conclusion from these cautionary notes would seem that PCA is of limited use
for XPS spectra. However, the data sets have been chosen to illustrate
potential difficulties and the primary source for these limitations is due to
the size of the data set. Statistical procedures are best used when there is
difficulty understanding large quantities of data. The test data set are small
and very understandable, hence PCA compares poorly to a simple inspection of
the data in the light of the methods used to create the spectra. Without prior
knowledge and obvious structure in the data set, the structures pulled from the
data by PCA would have been more rewarding. What is more, for genuinely large
data sets, reducing the data to a small quantity of, albeit, abstract
information does provide an insight into the nature of the problem. With these
thoughts in mind, the emphasis in the discussion will switch away from small
and less appropriate data sets to large quantities of data for which PCA is
used to great effect.
The name
Principal Component Analysis refers to the ability of a SVD to create sets of
basis vectors capable of describing the same sub-space as the original set of
non-orthogonal vectors defined by the data, where these basis vectors are
ranked according to their information content by the magnitude of the
corresponding eigenvalues. It is precisely this ability to measure and order
the shape information within the data that makes PCA such a powerful tool when
handling large data sets with a relatively small number of trends throughout
the full set of data. Even for the small data set shown in Figure 6, when a PCA
is performed on these data with simulated noise added, a PCA concentrates the
shape information into the abstract factors with the largest eigenvalues. What
is more, the random noise is gathered in the abstract factors with small
eigenvalues. A PCA of a data set therefore presents an opportunity to partition
the original vector space into useful
vectors and noise vectors. The success of this operation will depend on the
extent to which useful information is mixed with the noise. The situation most
favourable to the separation of noise from the useful information occurs when a
set of trends are massively over sampled. Happily, these types of experiments
are becoming more common.
To
understand the context for information partitioning using PCA, consider the
following analogy. Acquisition of a single spectrum is typically performed
using a total acquisition time chosen to result in an acceptable
signal-to-noise for the ensuing data. Logically the spectrum is constructed
from a set of spectra all assumed to be identical and acquired at smaller
dwell-times, all of which when summed equal the desired acquisition-time.
Summing these spectra and averaging in time provides a very simple
one-dimensional linear-least-squares solution, namely, a single spectrum with
adequate signal-to-noise. However, the assumption is that all the summed
spectra are equivalent. Now consider a variation on a theme. The problem of
acquiring spectra from a sample where there is a possibility the sample is
inhomogeneous and therefore spectra from different locations on the sample may
result in different spectra. This would prevent improved the signal-to-noise in
the data set by simply summing the spectra from numerous locations on the
surface. If every location on a sample surface results in a different spectrum
then there would be no alternative but to acquire each spectrum with the
dwell-time appropriate for the desired signal-to-noise. On the other hand, if
the number of distinct spectra is small compared to the number of locations
measured, then PCA offers a means of partitioning the set of spectra from the
different locations into a small number of abstract factors representing the
shapes in the spectra and a set of abstract factors characteristic on the noise
in the entire data set. If linear combinations of the significant abstract
factors are used to reconstruct the spectra, then the result is a set of spectra
for which the signal-to-noise is characteristic of the acquisition time for the
entire set of spectra rather than the dwell-time of the individual spectra. The
term, reconstruct the spectra, means
perform a PCA, identify the significant abstract factors and only use those
abstract factors added together in a sum similar to the sum shown in Figure 2,
where all loadings deemed associated with noise abstract factors are set equal
to zero. The result is a set of spectra with signal-to-noise enhanced but each
spectrum is associated with a different location on the sample surface and is
free to differ to the extent of the information in the retained abstract
factors. Further, the reconstructed spectra are a linear least squares
approximation to the original data where the basis vectors are the abstract
factors containing the shape information.
The
scenario described above is commonly found for techniques where parallel data
acquisition is possible for a set of experimental variables. Two obvious
examples are ToF SIMS images, where an ion beam raster over a surface pauses at
each pixel to collect a parallel mass spectrum, and stigmatic XPS imaging
technology, where parallel images of a surface are acquired at varying energy
(Figure 10). Both these examples are logically equivalent in that both result
in spectra at each pixel of an image. If a set of XPS images are acquired over
a sequence of energy steps spanning a photoelectric peak, then potentially
numerous images result from a small number of chemical states, background
artefacts and instrumental variations; a situation ideal for decomposition
using PCA.
As stated
earlier, the data matrix can be constructed from spectra or images. The SVD
proceeds identically for both apparently different, yet mathematically
equivalent types of data. The analysis of a stack of images as logically depicted
in Figure 10 would proceed using the same steps used to process the spectra in
Figure 6. The intention, for the image set, is to identify abstract factor
images containing spatially significant information and reconstruct the set of
images from only those loadings and abstract factors without noise. Once the
images are reconstructed from the significant abstract factors, the spectra at
each pixel are extracted with the view to defining quantification regions
and/or synthetic component. The quantification items are then used to compute
RSF scaled images in preparation for atomic concentration calculations on a
pixel-by-pixel basis. The true significance of performing the PCA analysis on
the image set lies in the dramatic reduction in noise associated with the
spectra at a pixel. Before processing, the signal-to-noise in the spectra leaves
much to be desired, however following PCA and omitting the noise components
from the reconstruction step, the signal-to-noise is characteristic of the time
invested in acquiring the entire set of images. For XPS, the significant gain
comes from the quality of the computed background now possible on the spectra at
each pixel (Figure 11).
Figure 11
shows an example of a spectrum taken from an image pixel, where the two spectra
correspond to the unprocessed data and the same data following data
reconstruction using the first four abstract factors from a stack of images
acquired at 0.3 eV energy steps across a Cr 2p doublet. The sample was an MRS-3
(Magnification Reference Standard) where ITO (In-Sn oxide) circles are
surrounded by CrO2. The data was acquired on an un-calibrated Kratos
Axis Ultra using a pulse counting delay line detector (DLD). The consequence of
using an un-calibrated instrument is that the images contain artefacts due to
the instrumental setup and therefore the data set provided an extreme example
of mixing information from more than simply chemical state variations. The
image in Figure 11 corresponds to the integration of the counts per second
between the Cr 2p peaks and background for each pixel in the image set. The
variation in intensity in the Cr 2p image is a result of not calibrating the
DLD for flat field response before acquiring the data and accounts for one of
the four significant abstract factors (Figure 12) required for the
reconstruction step. A second abstract factor can be attributed to a further
instrumental artefact, namely, peak alignment over the field of view, again
engineered into the data set by design. If the instrument was properly tuned
and calibrated, these two factors would diminish, however a third factor
corresponding to the background signal contains structure as a consequence of
variations due to the In-Sn oxide zones and represent an unwanted mixing of
spatial information exposed by PCA. Other possible explanations for these
chromium images containing more than one factor might be variations in X-ray
flux over the imaged area and topographical effects. Regardless of the source
for these additional factors, their inclusion in the reconstruction step is
important for a proper move from noise filled data to the smooth curve seen in
Figure 11.
Interpreting
a set of abstract factors as seen in Figure 12 is far from easy, as even the
most significant abstract factor is determined not just from the most dominant trend
in an image set, but is biased by the minor structures too. Visual inspection
might lead one to believe the top left abstract factor in Figure 12
corresponded to the flat field response of the detection system and by
excluding the abstract factor from the reconstruction step lead to a set of
calibrated images. The problem is that the abstract factors are determined as a
unit and therefore omitting the most significant factor from the reconstruction
of the images, while removing a large proportion of the flat field response,
will also remove background and subtle chemical information. The best method
for determining chemically meaningful images is to include all the abstract
factors of significance, regardless of source, and process the chemical
information using the spectra at each pixel with standard, tried and tested
quantification procedures from spectral quantification. Quantification items
from regions and synthetic components, when combined using the standard atomic
concentration formula, produce images free from instrumental, topological and
background artefacts. Quantification of the MRS-3 sample using five photoelectric
lines and basic quantification regions produces the atomic concentration images
seen in Figure 13. Compare the Cr 2p image in Figure 13 to the image in Figure
11. The intensity variations due to the DLD flat field response are absent from
Figure 13 and the image information can be trusted with the same degree as
atomic concentration tables generated from conventional spectral data.
The subject
of how to determine the number of abstract factors of significance to the data
description is yet to be discussed. When creating the Cr 2p spectra in Figure
11, four abstract factors where used to approximate each image in the data set.
Figure 14 illustrates the linear combination of the abstract factors for one
image in the set of images acquired over the Cr 2p doublet energy range. Included
in Figure 14 are the four abstract factors combined using four coefficients,
which are effectively determined in a least-squares sense, the image resulting
from the least-squares summation, the corresponding raw image and the fifth
abstract factor not used in the reconstruction step. Interestingly, none of the
four abstract factors individually bare any resemblance to the resulting image.
In fact the characteristics visible in the abstract factors are more easily
attributed to instrumental and background artefacts, the chromium image is
buried within and only reappears once an appropriate linear combination is
prescribed. The main reason for including four abstract factors is the lack of
any structure in the fifth abstract factor, also shown in Figure 14. The
subject of Factor Analysis is filled with statistics for assessing the number
of significant components, however none are foolproof and when the objective is
data reconstruction, which is the basis for the application in CasaXPS,
assessing the quality of the reconstructed data as a function of the number of
abstract factors would seem to be the most reason way to proceed. The greatest
danger, however, it that minor yet important structure in the data set is
blended in with the noise. A consequence of omitting information dispersed
throughout the noise abstract factors is the reconstructed spectra contain
non-physical features. Indeed, assessing the data using both reconstructed
images and spectra provides a consistency check for the procedure in general.
Both must continue to make sense form a physical perspective. It is therefore
worth experimenting with the number of abstract factors used to reconstruct the
data set and observe the development of noise within the reconstructed data
with each additional abstract factor.
The
significance of noise compared to genuine, but minor shape information can be
important for reconstructed spectra. Figure 9 provides an example of where the
introduction of noise has masked a real structure in the fifth abstract factor.
A problem with XPS data is that noise varies in magnitude with the square root
of the counts per data bin. For spectral where the peak is significantly more
intense than the background, the variation of the noise over an energy interval
is non-linear and more significant to a PCA calculation than a broadening or
small shifts in a peak position. One way to suppress the variations in the
noise as a function of intensity is to divide each bin by the square root of
the counts per bin. The act of taking the square root of the counts per bin is
equivalent to dividing by the square root and so squaring the result returns
the data to the original state. The spectrum in Figure 11 and the abstract
factors in Figure 14 are in fact determined for the data following the square
root function applied to each pixel and the reconstructed image in Figure 14 is
therefore the square of the linear combination of abstract factors.

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9: The first six abstract factors for two equivalent data sets, with and without simulated noise.

Figure 10

Figure 11

Figure 12

Figure 13

Figure 14