--PCA 2

The Neurotic Fishbowl

silenceZRY 发表于 2006/3/21 11:16:19

Table 3: Report generated by a PCA for Al 2p Profile. Factors Eigenvalue RMS RE (RSD) IE IND * 1000 Chi-sq Calc. Chi-sq Expected Al 2p/3 4727872000 82.5236 159.3106 34.76442 398.2765 26032.99 2600 Al 2p/8 65086870 3.784555 23.78739 7.340948 65.89305 136.4193 2451 Al 2p/13 393178.7 5.875731 20.74929 7.842494 64.04102 133.7949 2304 Al 2p/18 305001.5 3.48073 17.85783 7.793798 61.79182 78.0949 2159 Al 2p/23 156694.8 3.249367 16.25038 7.929371 63.47803 59.6866 2016 Al 2p/28 98359.06 2.757161 15.2192 8.135007 67.64091 51.1035 1875 Al 2p/33 86168.29 2.48836 14.18397 8.189118 72.36718 41.82519 1736 Al 2p/38 65267.54 2.22333 13.35424 8.242415 79.01916 37.98182 1599 Al 2p/43 53613.14 2.247765 12.61316 8.257255 87.59142 42.46731 1464 Al 2p/48 43569.08 0.1744253 11.97161 8.261198 98.93895 0.1869893 1331 Al 2p/53 32387.23 1.710532 11.52946 8.344409 115.2946 29.01139 1200 Al 2p/58 28174.98 2.021671 11.12658 8.410907 137.3652 33.42906 1071 Al 2p/63 24742 1.261896 10.75487 8.461885 168.0448 15.17705 944 Al 2p/68 23980.27 0.40768 10.29759 8.407944 210.1548 1.649807 819 Al 2p/73 20710.42 1.113217 9.867347 8.33943 274.093 10.69842 696 Al 2p/78 18345.12 1.155456 9.424948 8.226769 376.9979 15.75993 575 Al 2p/83 16109.71 0.7358818 8.960655 8.062218 560.0409 5.605129 456 Al 2p/88 13003.76 0.7303461 8.600543 7.962556 955.6159 5.882052 339 Al 2p/93 12307.15 0.7049177 7.99876 7.608339 1999.69 6.193881 224 Al 2p/98 9285.948 0.000443667 7.554815 7.372745 7554.815 2.12747E-06 111 Al 2p/103 7476.855 1.90305E-13 0 0 0 2.03675E-25 0 The chi-square indicates that the data matrix can be reproduced to within experimental error using two abstract factors. This is a result that is consistent with the physical nature of the sample. It is also interesting (from a mathematical standpoint) to note that using all the abstract factors to reproduce the data matrix returns a chi-square of zero (allowing for round-off errors in the computation). This should always be the case and provides an easy check to see that the calculation has been performed correctly. All the statistics expect the Indicator Function point to two abstract factors being sufficient to span the factor space for the data matrix. 500)this.width=500'> It is worth examining the data set using a subset of the spectra and Target Testing the spectra not used in the PCA. This allows anomalies to be identified such as spikes in the data. Selecting a representative subset of spectra for the PCA then target testing the remainder is particularly useful for large sets of data. Table 4: Target Test Report for a Subset of Al 2p Data Set. Target AET REP RET SPOIL Al 2p/3 Al 2p/8 Al 2p/23 20.93032 10.38374 18.17295 1.750135 0.39033 -0.04947 Al 2p/28 23.83028 11.05117 21.11288 1.910467 0.41839 0.017257 Al 2p/33 19.83736 11.47927 16.17861 1.409376 0.42997 0.065749 Al 2p/38 19.8507 12.01348 15.80274 1.315418 0.44268 0.10609 Al 2p/43 19.9069 12.46508 15.52116 1.245171 0.4531 0.133366 Al 2p/48 57.16561 12.70691 55.73546 4.386233 0.45854 0.14688 Al 2p/53 15.37333 13.18052 7.912861 0.600345 0.46791 0.174614 Al 2p/58 21.39836 13.30379 16.76004 1.259795 0.46901 0.184805 Al 2p/63 19.92528 13.5238 14.63296 1.082016 0.47386 0.195062 Al 2p/68 27.73522 13.78354 24.06775 1.746122 0.48087 0.203826 Al 2p/73 19.10189 13.88023 13.12332 0.945469 0.48192 0.210646 Al 2p/78 20.9575 13.98145 15.61204 1.116625 0.48264 0.218455 Al 2p/83 19.03813 14.15492 12.7314 0.899433 0.48483 0.229382 Al 2p/88 18.38591 14.11378 11.78317 0.83487 0.48374 0.228046 The SPOIL function and AET statistics (Table 4) show that Al 2p/48 differs in some respect from the other spectra in the list tested. The spectrum in question corresponds to the trace displaying the spikes seen in Figure 6. Also, another spectrum that could be looked at is Al 2p/68. The AET value is high compared to the other spectra. Such spectra may highlight interfaces where either new chemical states appear (either directly from features in the data or indirectly through changes in the background due features outside the acquisition region) or energy shifts due to sample charging have altered the characteristics of the data. The PCA report in Table 3 includes the spectrum labelled Al 2p/48 in the data matrix. The consequence of not removing the spikes is apparent in the 3-D factor space shown in Figure 9, where the abstract factor with third largest eigenvalue clearly contains spikes and the projection point number 10 derived from the Al 2p/48 spectrum is obviously a statistical outlier. PCA and CasaXPS Principal Component Analysis is offered on the "processing" window. The options on the property page labelled "PCA" allow spectra to be transformed into abstract factors according to a number of regimes. These include covariance about the origin and correlation about the origin. Each of these pre-processing methods may be applied with and without background subtraction. Quantification regions must be defined for each spectrum included in the factor analysis. In addition, each spectrum must have the same number of acquisition channels as the others in the set of spectra to be analysed. The first step in the calculation replaces the values in each spectrum by the result of interpolating the data within the defined quantification region for the spectrum. This is designed to allow energy shifts to be removed from the data used in the factor analysis. The quantification region also provides the type of background to the spectrum. Performing the analysis on background subtracted data attempts to remove artifacts in the spectrum that derive from other peaks within the vicinity of the energy region. Background contributions can be significant in PCA. Additional primary abstract factors are often introduced as a consequence of changes in the background rather than the underlying peaks within the region of interest. The presence of such abstract factors can be viewed as information extracted from the data, although in many circumstances they can lead to incorrect synthetic models if background contributions are misunderstood. A factor analysis is performed on the set of spectra displayed in the active tile. Although PCA is offered as a processing option, it is the only processing option that acts on a collection of spectra. Any other option from the processing window would only act upon the first VAMAS block in a selection when that selection is displayed in a single tile. The principal component analysis is performed when the "Apply" button is pressed. Each spectrum displayed in the active tile is replaced by the computed abstract factors. The order of the VAMAS blocks containing the spectra is used as the order for the abstract factors. The factor corresponding to the largest eigenvalue is entered first. Subsequent blocks receive the abstract factors in descending order defined by the size of the corresponding eigenvalues. A report showing the statistics for understanding the dimensionality of the factor space appears in a dialog window. A button labelled "PCA Report" allows the current PCA report to be re-displayed. Care should be exercised since the values are subject to any additional processing (including PCA) that may subsequently be applied to any of the spectra included in the original analysis. The PCA property page includes a button to reset the processing operations for every spectrum displayed in the active tile. This allows a PCA calculation to be undone in one stroke. It will also undo any processing previously performed on the data. PCA is aimed at the raw data; the chi-square statistic is referenced to the raw data and has an undefined meaning when the data have been processed prior to performing factor analysis. Target Factor Analysis in the form of target testing is also available on the PCA property page. Following a PCA, candidates for the physically meaningful components may be assessed individually or collectively. Choose an abstract factor from the PCA and entering this factor into the active tile. Then select the number of primary abstract factors for use in the target test procedure. A text field is offered on the PCA property page for this purpose and is found in the section headed "Target FA". Next, select the target test spectra in the Browser view and press the button labelled "TFA Apply". A report detailing the statistics calculated from the TFA procedure will appear in a dialog window. The TFA report may be written to file in an ASCII format with TAB separated columns. When pressed, any of the buttons above the columns on the report will display a file dialog window from which the output text-file can be specified. This method for saving a report to file is used by the PCA report (above) and the Linear Regression Report described below. Once a set of target spectra has been identified, these spectra can be used to reproduce the original set of spectra through a linear regression step. Enter the set of target spectra into the active tile; then select the original spectra in the Browser view. Press the button labelled "Linear Regression". A report shows the RMS differences between each of the original spectra and the predicted spectra calculated from a linear combination of the set of target spectra displayed in the active tile. The loading used to compute the predicted spectra are listed in the report. The report may be written to file using a similar procedure to the TFA report described above. Viewing the Data in Factor Space CasaXPS offers an option on the "Geometry" property page on the "Tile Display" dialog window labelled "Factor Space". If selected, the VAMAS blocks displayed in a tile are used to define the axes for a subspace and the original data are plotted, if possible, as a set of co-ordinates with respect to these axes. The plot represents a projection of the data space onto the subspace defined by a set of two or three abstract factors. 500)this.width=500'> The abstract factors defining the axes are graphed together with a list of the co-ordinate values for each of the spectra projected onto the subspace spanned by the chosen abstract factors (Figure 9). A 3-dimensional plot provides a visual interpretation for the spectra. Patterns formed by the spectra highlight trends within the data set and the relative importance of the abstract factors can be examined. A plot in which the axes are defined by unimportant factors generally appear random, while factors that are significant when describing the data typically produce plots containing recognisable structure.

阅读全文(2003) | 回复(0) | 编辑 | 精华

发表评论：