Principal Component Analysis

PCA is a dimensionality reduction technique. It takes a dataset with multiple features and transforms it into a new coordinate system, thereby simplifying the data while retaining its most critical aspects. Performing PCA involves a series of steps:

  1. To perform PCA, we need to center the data by subtracting the mean of each feature from the data points. This ensures that the new coordinate system is centered at the origin.
  2. Next, we will calculate the covariance of the matrix to summarize the relationships between the different features. Covariance can be calculated as,
    1.  Covariance Matrix    where Xc is the data that is centered and (n-1) is used for unbiased estimation of the covariance.
  3. After obtaining the covariance matrix, the next step is to calculate its eigenvectors and eigenvalues which represent the directions (principal components) of maximum variance in the data, and the eigenvalues indicate the amount of variance explained by each component.
    1. Eigenvalue Problem.  where V is the ith eigenvector and is the ith eigenvalue.
  4. To reduce the dimensionality of the data, you can select the top k eigenvectors (principal components) based on the corresponding eigenvalues. These k principal components capture the most variance in the data. The choice of k depends on the desired level of dimensionality reduction.
  5. Finally, we transform the original data into the new coordinate system defined by the selected principal components. This transformation is achieved by multiplying the centered data matrix by the matrix of selected principal components:
    1. Data Transformation is a matrix containing the top k eigenvectors and is the transformed data matrix.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top
Skip to toolbar