Principal Component Analysis

 

Modern XPS instruments are capable of generating copious quantities of data. When faced with such overwhelming information, the desire is strong to reduce the data to a small number of characteristic values from which true knowledge of a sample can be claimed. The success of such an approach relies heavily on the stability of the data reduction steps with respect to small but irrelevant changes in the data yet also sufficiently sensitive to reveal subtle but important differences held within. For XPS spectra, the traditional tools for reducing large number of data channels to meaningful characteristic values are peak integration and non-linear least squares optimisation using synthetic components. Both tools, to a greater or lesser degree, rely on the experience of the analyst performing the experiment and while quantification based on regions is general stable with respect to changes in the data, optimised synthetic components require more care, but offers chemical state information often lost within relatively coarse integration regions. Given extensive data sets coupled with these relatively manually-intensive data reduction options, a movement towards a statistical approach to characterising such data is a natural development. Principal Component Analysis (PCA) and related techniques are mathematical procedures for manipulating a data set into a form containing a reduced number of characteristic quantities. The usefulness of these techniques, as with all statistical analyses, depends on an analyst’s ability to interpret the results in the context of the sample. Interpreting the quantities generated by PCA is strongly coupled to an understanding of the underlying theory and an appreciation of the influence of irrelevant data fluctuations on the results. 

 

This chapter is concerned with explaining the theory behind PCA and attempting to place this theory in the context of XPS data. The discussion will begin with spectral data, however all statements are equally applicable to image data and ultimately, image analysis via PCA combined with the traditional XPS data reduction tools is the real success of this theory when applied through CasaXPS.

 

Data and Vectors

 

An XPS spectrum is a set of acquisition bins, where energy separated bins accumulate counts for a given period of time. The usual interpretation of these data are discrete approximations to a continuous function plotted using counts per second against binding energy. However, an alternative view is to interpret the counts in each energy bin as a coordinate value for one dimension in a multi-dimensional vector space. Therefore each spectrum may be seen as a vector in an m-dimensional space, where m is the number of energy bins. In isolation, the vector interpretation of the data is far less natural than the continuous function perspective, however if a spectrum is merely one of many such spectra, for example a spectrum included as part of a depth profile experiment, then the vector description becomes more advantageous for the following reason. While an experiment may contain numerous spectra, not all spectra are entirely independent of one another. From a practical perspective, an analysis of a depth profile using a synthetic peak model consisting of a small number of Gaussian/Lorentzian line-shapes is only possible if these line-shapes, when combined appropriately, offer an adequate description for each spectrum in the profile sequence. The use of synthetic models for this purpose is evidence that the set of spectra from a depth profile, say, must include much redundancy. It is precisely the objective of multivariate statistics to isolate common trends within a large sample of data and provide quantitative values characteristic of these trends. These statistical techniques are essentially based on numerical algorithms for manipulating sets of co-ordinates and therefore the vector interpretation becomes more appropriate for large sets of spectra.

 

Co-ordinate Systems, Basis Vectors and PCA

 

The co-ordinate system in Figure 1 is an extension of the two and three dimensional Cartesian co-ordinate system. Indeed the Cartesian co-ordinate system is natural in the sense that the unit vectors in the direction of the mutually-orthogonal co-ordinate axes are easily written down in terms of the digits one and zero. While natural from a mathematical viewpoint, the objective of mathematics is to provide a general description, often at the expense of the specific and while the Cartesian system works well to simplify general mathematical descriptions, the specifics of a data set can be hidden by the use of Cartesian co-ordinates. That is to say, the Cartesian co-ordinate axes are positioned without regard to the data being represented. The fact is that any m-dimensional space described using the Cartesian unit vectors in Figure 1 could equally well be described by a set of m mutually-orthogonal vector from the same vector space (Figure 2). The trick is, therefore, to choose a basis set such that the basis vectors uj describe the shapes within the set of spectra, very much as the synthetic line-shapes might do when curve fitting a profile, and allow the coefficients cji (Figure 2) to provide quantities characteristic of the surface.

 

Consider the following example.  Figure 3 illustrates a relatively small data set of artificial data created from idealised Gaussian/Lorentzian peak positions, FWHM and relative intensities based on PMMA and PVC C 1s spectra. The pure PMMA and pure PVC data envelopes are the top and bottom spectra displayed in the active tile in CasaXPS. Between these two spectra is a sequence of curves generated by linear interpolation using the pure shapes, so the set of curves gradually blend from PMMA to PVC. The spectra represent six vectors in a 322-dimension vector space, where the numerical value 322 is the number of acquisition bins for each spectrum. Although the number of data bins per spectrum is 322, it is easy to realise that six vectors can describe at most six dimensions and therefore the Cartesian representation uses many more unit vectors than are required to describe the data in Figure 3. Simply from the description for the construction of the data set, it is clear the two spectra for pure PMMA and pure PVC contain all the shape information within the sequence and therefore the set of six vectors must all lie in a plane. Further, the best choice for the two basis vectors for this plane would be precisely the two pure spectra normalised. In this case, all the quantitative information would lie in just two co-ordinate values with respect to these two unit vectors. What is more, a linear least squares procedure using the PMMA and PVC spectra as target vectors would fully characterise the variations designed into the data set. The key point here is that the set of six vectors displayed as spectra in Figure 3 only need two vectors and six sets of co-ordinates to characterise the entire data set.

 

The problem of characterising a data set by inspection without prior knowledge is a little more difficult. The procedure fundamental to determining dominant shapes within a data set is related to linear-least-squares, where the mechanism by which a linear least squares analysis using the pure PMMA and PVC spectra can also be used to determine a basis for the vector space spanned by the six spectra in Figure 3. Consider the linear-least squares problem: given two basis vectors corresponding to the PMMA and PVC spectra, the linear-least-squares procedure constructs a matrix A of size 322 x 2, where the columns of the matrix A are the two vectors for PMMA and PVC. Following through the analysis of linear least squares results in the normal equations ATAa = ATb, where a is the least-square solution for b, an unknown spectral vector. It can be shown that solving these normal equations is equivalent to performing a Singular Valued Decomposition of the matrix A into three matrices A = UWVT, where W is a diagonal matrix constructed from the eigenvalues of ATA and V is the matrix of eigenvectors corresponding to the eigenvalues in W. The matrix U is the same size as A and corresponds to the eigenvectors of AAT. The least squares solution is computed using the terms in the Singular Valued Decomposition by effectively projecting b, the unknown, onto the columns of U weighted by the reciprocal of the  diagonal elements from W and using these quantities to form a linear combination of the vectors from V. In summary, the least-squares solution for an unknown spectrum in terms of known spectra is completely determined by the SVD matrices. The eigenvectors for ATA form an orthogonal basis for the least squares solution space while the columns of the U matrix form an orthogonal basis set for the same subspace spanned by the columns of A. What is more, the use of the matrix ATA in the algorithm guides the choice of basis towards the variations in the data.

 

By analogy, considering each of the six spectra in Figure 3 as a possible basis vector for a linear least squares decomposition of an unknown spectrum. An SVD for the matrix of size 322 x 6, consisting of the six vectors, can be constructed in the image of the matrices shown in Figure 4. The result is a similar prescription for determining the set of least-squares vectors within the subspace spanned by the original six vectors. In other words, the SVD offers up the new basis for the vector space spanned by the original spectra. Further, because the matrices U and V are both orthogonal, their action on any vector preserves the distance metric, therefore, the W matrix contains all the information regarding the relative contributions of these eigenvectors to the description of the subspace. If a diagonal element of W is too small then this simply means the corresponding eigenvectors should be excluded from any subsequent calculations. The implication of a small eigenvalue is that the original vectors include a linear dependency and therefore the dimension of the subspace was less than the number of spectra used in the A matrix. What is more, the diagonal elements of W rank the importance of the basis vectors appearing as the column vectors in U, hence the commonly used name for the SVD result, namely, Principal Component Analysis. That is, the principal components are those column vectors in U for which the corresponding eigenvalue are considered to be significant.

 

An SVD performed on the six spectra in Figure 3 is presented in Figure 5. The traces in the active tile (Figure 5) are the unit vectors from the U matrix ranked by magnitude of the corresponding eigenvalue. The tables appearing in the dialog windows in Figure 5 list the eigenvalues and related statistics, plus the co-ordinates of the original spectra with respect to the unit vectors from the U matrix. These results are entirely consistent with the interpretation of the data set given above, namely, only two eigenvalues are non-zero and therefore only two linearly independent basis vectors were found. The outcome of the PCA is therefore that the entire set of six spectra is completely described by two vectors and six pairs of co-ordinates. The Cartesian description (Figure 1) required 322 unit vectors and 322 co-ordinates per vector. The PCA has therefore performed a significant data reduction. However the outstanding question of how to interpret the reduced number of characteristic values still remains.

 

The unit vectors displayed in Figure 5 are referred to as abstract factors (AF). These abstract factors encapsulate the shape information from the set of six spectra, but have been chosen in such a way that the two significant abstract factors account for the maximum contribution to the entire set of spectra subject to the constraint that the abstract factors are mutually orthogonal. The co-ordinates are often referred to as the loadings for the data with respect to the abstract factors and, for this example, do follow a trend from pure PMMA to pure PVC; however the numerical values relate to the proportions of the abstract factors rather than a direct connection with the pure spectra. The abstract factors are the appropriate linear combination of the PMMA and PVC spectra so that if the data set, as a whole, is approximated by the abstract factors weighted using the corresponding loading, the root mean square error monotonically decreases with the addition of each new abstract factor ordered by eigenvalue. Thus, the reduced set of characteristic values does not clearly identify the chemical information but rather indicates a transition between two states has occurred.

 

Some Cautionary Notes on PCA

 

It is clear from the above discussion that PCA is a mathematical tool offering a data oriented perspective for a large data set. The relationship to the linear-least-squares procedure provides some comfort to those wishing to dissect an experiment; however the results of PCA are not always as easy to interpret as might be thought at first sight. The following are a list of cautionary notes, where the intention is to expose these pitfalls rather than condemn PCA as a technique. Indeed, in the right context, PCA is a powerful tool and for certain types of multi-dimensional data set, applying the theory of PCA as part of an analysis yields remarkable results.

 

The first note relates to curve-fitting and the ability of PCA to assist when constructing a peak model. Once again, consider the data set in Figure 3. It is worth emphasizing the fact that the set of six spectra represent a movement between two data envelopes, however even though each envelope is in turn constructed from individual Gaussian/Lorentzian peaks, the PCA analysis has no way of identifying the underlying peaks, since the variation in the data set was achieved by varying these underlying peaks as a unit. PCA therefore indicates the number of synthetic models required to describe the changes in the data set, not the number of synthetic peaks required to construct the synthetic models. As a result PCA makes a rather weak statement regarding the number of synthetic components in a data set, namely, PCA at best only offers the minimum number of components required and will only deliver the exact number provide all the individual transitions present in a data set vary independently at some point during the course of the experiment. At worst, PCA is inconclusive regarding the true number of trends and is entirely misleading if used to attempt inference of the number of synthetic components in a peak model.

 

The data set shown in Figure 6 is constructed to illustrate the sources for confusion when interpreting PCA results. The same method for construction of the data shown in Figure 3 is employed for the first column of VAMAS blocks shown selected in the data set in Figure 6.  In the case of the data in Figure 6, three synthetic polymer data envelopes are blended using linear interpolation, namely, PMMA, PVC and PIB. Three additional columns of VAMAS blocks are similar blends of PVC to modified PVC data, where the modified PVC is either energy shifted, intensity scaled or convoluted with a Gaussian to simulate broadening of a spectral feature. All three modifications to an initial line-shape might be expected throughout the course of a multi-layer sputter depth profile, for example. A PCA applied to the data in the first column in Figure 6 is displayed in Figure 7, where the graphical display plots the loadings for each of the fifteen spectra used in the PCA with respect to the three most significant abstract factors. The axes in the 3-D plot represent the first three abstract factors, which as expected, account for all the shape information in the data set. The abstract factors are also plotted in Figure 7 and the coordinates for each of the original spectra used in the PCA are listed.

 

The first observation based on the data in Figure 7 is the importance of both parts of the PCA to the description of the data set. That is, the loadings clearly show an orderly pattern for the spectra when projected onto the first three abstract factors, but the meaning of this pattern is only discerned, if at all, by examining the abstract factors.  Now consider a PCA performed using the same data as that used in Figure 7 but with the addition of the set of PVC spectra in the third column of VAMAS blocks in Figure 6. These new additions to the PCA calculation are merely the PVC spectrum scaled by various constants. Clearly, since PVC is already present in the original data set, scaling the PVC spectrum and including these data will not add any new shape information to the data set. Thus performing a PCA on the new set of spectra results in the information displayed in Figure 8, where the PCA result dialog window indicated there are still only three significant (non-zero) eigenvalues, as expected. The addition of these scaled PVC spectra to the PCA calculation has, however, significantly altered the abstract factors. Comparing the factors between Figure 7 and Figure 8 it is clear the loadings are very different and if the loadings are taken in isolation, the inference would be a dramatic difference had occurred between the two data sets. An initial response might be such differences are good and offer an indicator of changes between samples. However consider a depth profile through an oxide layer in silicon. If the same material is profiled, but using a different number of etch-cycles, then the two data sets would be very similar to the situation in this example. Extra etch-cycles through the metallic silicon layer would bias the data set towards the metallic signal and therefore the two experiments might easily generate two different profiles based on the loadings alone. The important thing to note is that the PCA is characterising changes in not only the sample, but also the experimental parameters used to perform the measurement. Changing the acquisition parameters can introduce major changes in the abstract factors and therefore the loadings.

 

Two further sequences of PVC spectra are included in Figure 6. The consequence of a peak shifting in energy is an additional significant abstract factor. Similar, a peak broadening also introduces a new abstract factor so when these two sets of essentially PVC spectra are added to the data from the first column, a PCA for these data results in five significant non-zero eigenvalues and therefore abstract factors (Figure 9). At this point it is worth noting that the synthetic data used in these tests does not include noise, hence the eigenvalues, while small, are still very distinct from the eigenvalues genuinely equal to zero. The point of introducing these modifications to the PVC spectrum is that PCA, when confronted with commonly found variations in XPS spectra, indicates an increased number of trends in the data set compared to those that might be considered significant with respect to the sample. Baring in mind XPS includes other adjustments to peak shapes resulting from background variations, the prospect of assigning the number of trends within a data set based on the number of significant abstract factors becomes daunting.

 

Real XPS differs from the spectra in Figure 6 in that Poisson statistics cause variations in the signal which can mask underlying trends otherwise identified by PCA. The six abstract factors in the left-hand-side of Figure 9 are normalised vectors and while the least significant abstract factor may appear to contain noise, in fact the rapid oscillations in the unit vector are due to numerical round-off errors. The corresponding eigenvalue for the sixth abstract factor is twelve orders of magnitude smaller than the largest eigenvalue and since the eigenvalues contain the size information from a PCA, the sixth abstract factor makes no contribution to the description of the original data set. The same data as seen in Figure 6, for which with simulated noise consistent with Poisson statistical variations is added, results in the abstract factors and eigenvalues in the right-hand-side of Figure 9. The sixth abstract factor for the noisy data set is as significant in terms of size as the fifth, which itself resembles noise and therefore has lost much of the shape information clearly present in the noise-free PCA abstract factors. The point to take from this example is that information in the data set clearly distinguishable in the absence of noise is dispersed throughout the abstract factors for data with noise content and therefore identifying the number of underlying trends is far less straight forward for real data sets.

 

The conclusion from these cautionary notes would seem that PCA is of limited use for XPS spectra. However, the data sets have been chosen to illustrate potential difficulties and the primary source for these limitations is due to the size of the data set. Statistical procedures are best used when there is difficulty understanding large quantities of data. The test data set are small and very understandable, hence PCA compares poorly to a simple inspection of the data in the light of the methods used to create the spectra. Without prior knowledge and obvious structure in the data set, the structures pulled from the data by PCA would have been more rewarding. What is more, for genuinely large data sets, reducing the data to a small quantity of, albeit, abstract information does provide an insight into the nature of the problem. With these thoughts in mind, the emphasis in the discussion will switch away from small and less appropriate data sets to large quantities of data for which PCA is used to great effect.

 

A True Application for PCA: Spectra and Images

 

The name Principal Component Analysis refers to the ability of a SVD to create sets of basis vectors capable of describing the same sub-space as the original set of non-orthogonal vectors defined by the data, where these basis vectors are ranked according to their information content by the magnitude of the corresponding eigenvalues. It is precisely this ability to measure and order the shape information within the data that makes PCA such a powerful tool when handling large data sets with a relatively small number of trends throughout the full set of data. Even for the small data set shown in Figure 6, when a PCA is performed on these data with simulated noise added, a PCA concentrates the shape information into the abstract factors with the largest eigenvalues. What is more, the random noise is gathered in the abstract factors with small eigenvalues. A PCA of a data set therefore presents an opportunity to partition the original vector space into useful vectors and noise vectors. The success of this operation will depend on the extent to which useful information is mixed with the noise. The situation most favourable to the separation of noise from the useful information occurs when a set of trends are massively over sampled. Happily, these types of experiments are becoming more common.

 

To understand the context for information partitioning using PCA, consider the following analogy. Acquisition of a single spectrum is typically performed using a total acquisition time chosen to result in an acceptable signal-to-noise for the ensuing data. Logically the spectrum is constructed from a set of spectra all assumed to be identical and acquired at smaller dwell-times, all of which when summed equal the desired acquisition-time. Summing these spectra and averaging in time provides a very simple one-dimensional linear-least-squares solution, namely, a single spectrum with adequate signal-to-noise. However, the assumption is that all the summed spectra are equivalent. Now consider a variation on a theme. The problem of acquiring spectra from a sample where there is a possibility the sample is inhomogeneous and therefore spectra from different locations on the sample may result in different spectra. This would prevent improved the signal-to-noise in the data set by simply summing the spectra from numerous locations on the surface. If every location on a sample surface results in a different spectrum then there would be no alternative but to acquire each spectrum with the dwell-time appropriate for the desired signal-to-noise. On the other hand, if the number of distinct spectra is small compared to the number of locations measured, then PCA offers a means of partitioning the set of spectra from the different locations into a small number of abstract factors representing the shapes in the spectra and a set of abstract factors characteristic on the noise in the entire data set. If linear combinations of the significant abstract factors are used to reconstruct the spectra, then the result is a set of spectra for which the signal-to-noise is characteristic of the acquisition time for the entire set of spectra rather than the dwell-time of the individual spectra. The term, reconstruct the spectra, means perform a PCA, identify the significant abstract factors and only use those abstract factors added together in a sum similar to the sum shown in Figure 2, where all loadings deemed associated with noise abstract factors are set equal to zero. The result is a set of spectra with signal-to-noise enhanced but each spectrum is associated with a different location on the sample surface and is free to differ to the extent of the information in the retained abstract factors. Further, the reconstructed spectra are a linear least squares approximation to the original data where the basis vectors are the abstract factors containing the shape information.

 

The scenario described above is commonly found for techniques where parallel data acquisition is possible for a set of experimental variables. Two obvious examples are ToF SIMS images, where an ion beam raster over a surface pauses at each pixel to collect a parallel mass spectrum, and stigmatic XPS imaging technology, where parallel images of a surface are acquired at varying energy (Figure 10). Both these examples are logically equivalent in that both result in spectra at each pixel of an image. If a set of XPS images are acquired over a sequence of energy steps spanning a photoelectric peak, then potentially numerous images result from a small number of chemical states, background artefacts and instrumental variations; a situation ideal for decomposition using PCA.

 

As stated earlier, the data matrix can be constructed from spectra or images. The SVD proceeds identically for both apparently different, yet mathematically equivalent types of data. The analysis of a stack of images as logically depicted in Figure 10 would proceed using the same steps used to process the spectra in Figure 6. The intention, for the image set, is to identify abstract factor images containing spatially significant information and reconstruct the set of images from only those loadings and abstract factors without noise. Once the images are reconstructed from the significant abstract factors, the spectra at each pixel are extracted with the view to defining quantification regions and/or synthetic component. The quantification items are then used to compute RSF scaled images in preparation for atomic concentration calculations on a pixel-by-pixel basis. The true significance of performing the PCA analysis on the image set lies in the dramatic reduction in noise associated with the spectra at a pixel. Before processing, the signal-to-noise in the spectra leaves much to be desired, however following PCA and omitting the noise components from the reconstruction step, the signal-to-noise is characteristic of the time invested in acquiring the entire set of images. For XPS, the significant gain comes from the quality of the computed background now possible on the spectra at each pixel (Figure 11).

 

Figure 11 shows an example of a spectrum taken from an image pixel, where the two spectra correspond to the unprocessed data and the same data following data reconstruction using the first four abstract factors from a stack of images acquired at 0.3 eV energy steps across a Cr 2p doublet. The sample was an MRS-3 (Magnification Reference Standard) where ITO (In-Sn oxide) circles are surrounded by CrO2. The data was acquired on an un-calibrated Kratos Axis Ultra using a pulse counting delay line detector (DLD). The consequence of using an un-calibrated instrument is that the images contain artefacts due to the instrumental setup and therefore the data set provided an extreme example of mixing information from more than simply chemical state variations. The image in Figure 11 corresponds to the integration of the counts per second between the Cr 2p peaks and background for each pixel in the image set. The variation in intensity in the Cr 2p image is a result of not calibrating the DLD for flat field response before acquiring the data and accounts for one of the four significant abstract factors (Figure 12) required for the reconstruction step. A second abstract factor can be attributed to a further instrumental artefact, namely, peak alignment over the field of view, again engineered into the data set by design. If the instrument was properly tuned and calibrated, these two factors would diminish, however a third factor corresponding to the background signal contains structure as a consequence of variations due to the In-Sn oxide zones and represent an unwanted mixing of spatial information exposed by PCA. Other possible explanations for these chromium images containing more than one factor might be variations in X-ray flux over the imaged area and topographical effects. Regardless of the source for these additional factors, their inclusion in the reconstruction step is important for a proper move from noise filled data to the smooth curve seen in Figure 11.

 

Interpreting a set of abstract factors as seen in Figure 12 is far from easy, as even the most significant abstract factor is determined not just from the most dominant trend in an image set, but is biased by the minor structures too. Visual inspection might lead one to believe the top left abstract factor in Figure 12 corresponded to the flat field response of the detection system and by excluding the abstract factor from the reconstruction step lead to a set of calibrated images. The problem is that the abstract factors are determined as a unit and therefore omitting the most significant factor from the reconstruction of the images, while removing a large proportion of the flat field response, will also remove background and subtle chemical information. The best method for determining chemically meaningful images is to include all the abstract factors of significance, regardless of source, and process the chemical information using the spectra at each pixel with standard, tried and tested quantification procedures from spectral quantification. Quantification items from regions and synthetic components, when combined using the standard atomic concentration formula, produce images free from instrumental, topological and background artefacts. Quantification of the MRS-3 sample using five photoelectric lines and basic quantification regions produces the atomic concentration images seen in Figure 13. Compare the Cr 2p image in Figure 13 to the image in Figure 11. The intensity variations due to the DLD flat field response are absent from Figure 13 and the image information can be trusted with the same degree as atomic concentration tables generated from conventional spectral data.

 

The subject of how to determine the number of abstract factors of significance to the data description is yet to be discussed. When creating the Cr 2p spectra in Figure 11, four abstract factors where used to approximate each image in the data set. Figure 14 illustrates the linear combination of the abstract factors for one image in the set of images acquired over the Cr 2p doublet energy range. Included in Figure 14 are the four abstract factors combined using four coefficients, which are effectively determined in a least-squares sense, the image resulting from the least-squares summation, the corresponding raw image and the fifth abstract factor not used in the reconstruction step. Interestingly, none of the four abstract factors individually bare any resemblance to the resulting image. In fact the characteristics visible in the abstract factors are more easily attributed to instrumental and background artefacts, the chromium image is buried within and only reappears once an appropriate linear combination is prescribed. The main reason for including four abstract factors is the lack of any structure in the fifth abstract factor, also shown in Figure 14. The subject of Factor Analysis is filled with statistics for assessing the number of significant components, however none are foolproof and when the objective is data reconstruction, which is the basis for the application in CasaXPS, assessing the quality of the reconstructed data as a function of the number of abstract factors would seem to be the most reason way to proceed. The greatest danger, however, it that minor yet important structure in the data set is blended in with the noise. A consequence of omitting information dispersed throughout the noise abstract factors is the reconstructed spectra contain non-physical features. Indeed, assessing the data using both reconstructed images and spectra provides a consistency check for the procedure in general. Both must continue to make sense form a physical perspective. It is therefore worth experimenting with the number of abstract factors used to reconstruct the data set and observe the development of noise within the reconstructed data with each additional abstract factor.

 

The significance of noise compared to genuine, but minor shape information can be important for reconstructed spectra. Figure 9 provides an example of where the introduction of noise has masked a real structure in the fifth abstract factor. A problem with XPS data is that noise varies in magnitude with the square root of the counts per data bin. For spectral where the peak is significantly more intense than the background, the variation of the noise over an energy interval is non-linear and more significant to a PCA calculation than a broadening or small shifts in a peak position. One way to suppress the variations in the noise as a function of intensity is to divide each bin by the square root of the counts per bin. The act of taking the square root of the counts per bin is equivalent to dividing by the square root and so squaring the result returns the data to the original state. The spectrum in Figure 11 and the abstract factors in Figure 14 are in fact determined for the data following the square root function applied to each pixel and the reconstructed image in Figure 14 is therefore the square of the linear combination of abstract factors.

 

 

 

Figure 1

 

Figure 2

 

 

 

 

 

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9: The first six abstract factors for two equivalent data sets, with and without simulated noise.

 

 

Figure 10

 

Figure 11

Figure 12

Figure 13

Figure 14