Table 3: Report generated by a PCA for Al 2p Profile.
Factors
Eigenvalue
RMS
RE (RSD)
IE
IND * 1000
Chi-sq
Calc.
Chi-sq
Expected
Al 2p/3
4727872000
82.5236
159.3106
34.76442
398.2765
26032.99
2600
Al 2p/8
65086870
3.784555
23.78739
7.340948
65.89305
136.4193
2451
Al 2p/13
393178.7
5.875731
20.74929
7.842494
64.04102
133.7949
2304
Al 2p/18
305001.5
3.48073
17.85783
7.793798
61.79182
78.0949
2159
Al 2p/23
156694.8
3.249367
16.25038
7.929371
63.47803
59.6866
2016
Al 2p/28
98359.06
2.757161
15.2192
8.135007
67.64091
51.1035
1875
Al 2p/33
86168.29
2.48836
14.18397
8.189118
72.36718
41.82519
1736
Al 2p/38
65267.54
2.22333
13.35424
8.242415
79.01916
37.98182
1599
Al 2p/43
53613.14
2.247765
12.61316
8.257255
87.59142
42.46731
1464
Al 2p/48
43569.08
0.1744253
11.97161
8.261198
98.93895
0.1869893
1331
Al 2p/53
32387.23
1.710532
11.52946
8.344409
115.2946
29.01139
1200
Al 2p/58
28174.98
2.021671
11.12658
8.410907
137.3652
33.42906
1071
Al 2p/63
24742
1.261896
10.75487
8.461885
168.0448
15.17705
944
Al 2p/68
23980.27
0.40768
10.29759
8.407944
210.1548
1.649807
819
Al 2p/73
20710.42
1.113217
9.867347
8.33943
274.093
10.69842
696
Al 2p/78
18345.12
1.155456
9.424948
8.226769
376.9979
15.75993
575
Al 2p/83
16109.71
0.7358818
8.960655
8.062218
560.0409
5.605129
456
Al 2p/88
13003.76
0.7303461
8.600543
7.962556
955.6159
5.882052
339
Al 2p/93
12307.15
0.7049177
7.99876
7.608339
1999.69
6.193881
224
Al 2p/98
9285.948
0.000443667
7.554815
7.372745
7554.815
2.12747E-06
111
Al 2p/103
7476.855
1.90305E-13
0
0
0
2.03675E-25
0
The chi-square indicates that the data matrix can be reproduced to within experimental error using two abstract factors. This is a result that is consistent with the physical nature of the sample. It is also interesting (from a mathematical standpoint) to note that using all the abstract factors to reproduce the data matrix returns a chi-square of zero (allowing for round-off errors in the computation). This should always be the case and provides an easy check to see that the calculation has been performed correctly.
All the statistics expect the Indicator Function point to two abstract factors being sufficient to span the factor space for the data matrix.
500)this.width=500'>
It is worth examining the data set using a subset of the spectra and Target Testing the spectra not used in the PCA. This allows anomalies to be identified such as spikes in the data. Selecting a representative subset of spectra for the PCA then target testing the remainder is particularly useful for large sets of data.
Table 4: Target Test Report for a Subset of Al 2p Data Set.
Target
AET
REP
RET
SPOIL
Al 2p/3
Al 2p/8
Al 2p/23
20.93032
10.38374
18.17295
1.750135
0.39033
-0.04947
Al 2p/28
23.83028
11.05117
21.11288
1.910467
0.41839
0.017257
Al 2p/33
19.83736
11.47927
16.17861
1.409376
0.42997
0.065749
Al 2p/38
19.8507
12.01348
15.80274
1.315418
0.44268
0.10609
Al 2p/43
19.9069
12.46508
15.52116
1.245171
0.4531
0.133366
Al 2p/48
57.16561
12.70691
55.73546
4.386233
0.45854
0.14688
Al 2p/53
15.37333
13.18052
7.912861
0.600345
0.46791
0.174614
Al 2p/58
21.39836
13.30379
16.76004
1.259795
0.46901
0.184805
Al 2p/63
19.92528
13.5238
14.63296
1.082016
0.47386
0.195062
Al 2p/68
27.73522
13.78354
24.06775
1.746122
0.48087
0.203826
Al 2p/73
19.10189
13.88023
13.12332
0.945469
0.48192
0.210646
Al 2p/78
20.9575
13.98145
15.61204
1.116625
0.48264
0.218455
Al 2p/83
19.03813
14.15492
12.7314
0.899433
0.48483
0.229382
Al 2p/88
18.38591
14.11378
11.78317
0.83487
0.48374
0.228046
The SPOIL function and AET statistics (Table 4) show that Al 2p/48 differs in some respect from the other spectra in the list tested. The spectrum in question corresponds to the trace displaying the spikes seen in Figure 6. Also, another spectrum that could be looked at is Al 2p/68. The AET value is high compared to the other spectra. Such spectra may highlight interfaces where either new chemical states appear (either directly from features in the data or indirectly through changes in the background due features outside the acquisition region) or energy shifts due to sample charging have altered the characteristics of the data.
The PCA report in Table 3 includes the spectrum labelled Al 2p/48 in the data matrix. The consequence of not removing the spikes is apparent in the 3-D factor space shown in Figure 9, where the abstract factor with third largest eigenvalue clearly contains spikes and the projection point number 10 derived from the Al 2p/48 spectrum is obviously a statistical outlier.
PCA and CasaXPS
Principal Component Analysis is offered on the "processing" window. The options on the property page labelled "PCA" allow spectra to be transformed into abstract factors according to a number of regimes. These include covariance about the origin and correlation about the origin. Each of these pre-processing methods may be applied with and without background subtraction.
Quantification regions must be defined for each spectrum included in the factor analysis. In addition, each spectrum must have the same number of acquisition channels as the others in the set of spectra to be analysed. The first step in the calculation replaces the values in each spectrum by the result of interpolating the data within the defined quantification region for the spectrum. This is designed to allow energy shifts to be removed from the data used in the factor analysis.
The quantification region also provides the type of background to the spectrum. Performing the analysis on background subtracted data attempts to remove artifacts in the spectrum that derive from other peaks within the vicinity of the energy region. Background contributions can be significant in PCA. Additional primary abstract factors are often introduced as a consequence of changes in the background rather than the underlying peaks within the region of interest. The presence of such abstract factors can be viewed as information extracted from the data, although in many circumstances they can lead to incorrect synthetic models if background contributions are misunderstood.
A factor analysis is performed on the set of spectra displayed in the active tile. Although PCA is offered as a processing option, it is the only processing option that acts on a collection of spectra. Any other option from the processing window would only act upon the first VAMAS block in a selection when that selection is displayed in a single tile.
The principal component analysis is performed when the "Apply" button is pressed. Each spectrum displayed in the active tile is replaced by the computed abstract factors. The order of the VAMAS blocks containing the spectra is used as the order for the abstract factors. The factor corresponding to the largest eigenvalue is entered first. Subsequent blocks receive the abstract factors in descending order defined by the size of the corresponding eigenvalues. A report showing the statistics for understanding the dimensionality of the factor space appears in a dialog window.
A button labelled "PCA Report" allows the current PCA report to be re-displayed. Care should be exercised since the values are subject to any additional processing (including PCA) that may subsequently be applied to any of the spectra included in the original analysis.
The PCA property page includes a button to reset the processing operations for every spectrum displayed in the active tile. This allows a PCA calculation to be undone in one stroke. It will also undo any processing previously performed on the data. PCA is aimed at the raw data; the chi-square statistic is referenced to the raw data and has an undefined meaning when the data have been processed prior to performing factor analysis.
Target Factor Analysis in the form of target testing is also available on the PCA property page. Following a PCA, candidates for the physically meaningful components may be assessed individually or collectively. Choose an abstract factor from the PCA and entering this factor into the active tile. Then select the number of primary abstract factors for use in the target test procedure. A text field is offered on the PCA property page for this purpose and is found in the section headed "Target FA". Next, select the target test spectra in the Browser view and press the button labelled "TFA Apply". A report detailing the statistics calculated from the TFA procedure will appear in a dialog window.
The TFA report may be written to file in an ASCII format with TAB separated columns. When pressed, any of the buttons above the columns on the report will display a file dialog window from which the output text-file can be specified. This method for saving a report to file is used by the PCA report (above) and the Linear Regression Report described below.
Once a set of target spectra has been identified, these spectra can be used to reproduce the original set of spectra through a linear regression step. Enter the set of target spectra into the active tile; then select the original spectra in the Browser view. Press the button labelled "Linear Regression". A report shows the RMS differences between each of the original spectra and the predicted spectra calculated from a linear combination of the set of target spectra displayed in the active tile. The loading used to compute the predicted spectra are listed in the report. The report may be written to file using a similar procedure to the TFA report described above.
Viewing the Data in Factor Space
CasaXPS offers an option on the "Geometry" property page on the "Tile Display" dialog window labelled "Factor Space". If selected, the VAMAS blocks displayed in a tile are used to define the axes for a subspace and the original data are plotted, if possible, as a set of co-ordinates with respect to these axes. The plot represents a projection of the data space onto the subspace defined by a set of two or three abstract factors.
500)this.width=500'>
The abstract factors defining the axes are graphed together with a list of the co-ordinate values for each of the spectra projected onto the subspace spanned by the chosen abstract factors (Figure 9). A 3-dimensional plot provides a visual interpretation for the spectra. Patterns formed by the spectra highlight trends within the data set and the relative importance of the abstract factors can be examined. A plot in which the axes are defined by unimportant factors generally appear random, while factors that are significant when describing the data typically produce plots containing recognisable structure.