# Nakafa Framework: LLM URL: https://nakafa.com/en/subject/high-school/11/mathematics/statistics/product-moment-correlation Source: https://raw.githubusercontent.com/nakafaai/nakafa.com/refs/heads/main/packages/contents/subject/high-school/11/mathematics/statistics/product-moment-correlation/en.mdx Output docs content for large language models. --- import { ScatterDiagram } from "@repo/design-system/components/contents/scatter-diagram"; export const metadata = { title: "Product Moment Correlation", description: "Calculate Pearson's correlation coefficient (r) to measure linear relationships. Learn formulas, scatter plot analysis, and interpret r values from -1 to +1.", authors: [{ name: "Nabil Akbarazzima Fatih" }], date: "04/30/2025", subject: "Statistics", }; ## What Is Product Moment Correlation? Product Moment Correlation, often called **Pearson Correlation** or simply denoted by ****, is the most commonly used statistical measure to determine how strong and in what direction the **linear relationship** (straight-line pattern) is between two quantitative variables (numbers). The value of tells us whether the two variables tend to move in the same direction (positive), opposite directions (negative), or if there is no linear relationship at all. ## Correlation from Scatter Diagrams The most intuitive way to understand the value of is by looking at how the data points are scattered on a diagram: ### Strong Positive Correlation approaching +1 means both variables tend to move in the same direction. If your data points look like this (rising from bottom left to top right and tightly clustered), the value will be close to +1. ### Weak Positive Correlation being positive but close to 0 means both variables tend to move in the same direction, but not very strongly. If the points still show an upward trend but are more scattered like this, the value is positive but smaller (closer to 0). ### Strong Negative Correlation approaching -1 means both variables tend to move in opposite directions. If the points fall from top left to bottom right and are very tightly clustered, the value will be close to -1. ### No Linear Correlation approaching 0 means the two variables have no linear relationship. When the points are scattered randomly without a clear linear pattern, the value will be close to 0. ## How is r Calculated? The Pearson correlation coefficient () essentially measures how **synchronously** two variables (X and Y) move relative to their own variations. **Imagine this:** 1. **Individual Variation:** Each variable (X and Y) has its own variability. Some values fluctuate a lot (large variation), while others are stable (small variation). This is measured by for X and for Y (formulas below). 2. **Joint Variation (Covariance):** We also need to know how X and Y vary _together_. When X increases, does Y also tend to increase? Or decrease? This measure of joint variation is called **covariance**, calculated using . - If is large and positive: X and Y often move in the same direction. - If is large and negative: X and Y often move in opposite directions. - If is close to zero: No clear pattern of joint movement. 3. **Standardizing the Measure:** The problem is that the value of (covariance) is heavily influenced by the units of the data. For example, the covariance between height (cm) and weight (kg) will have a different value if we measure height in meters and weight in grams, even if the relationship is the same. To overcome this, we need to **standardize** the covariance measure. This is done by **dividing the covariance () by a measure of the individual variations** (adjusted using square roots: ). The result of this division is ****, the Pearson Correlation Coefficient. Because it's standardized, its value will always be between -1 and +1, regardless of the original data units. This allows us to compare the strength of linear relationships between different pairs of variables. So, the value of is determined by comparing how strongly X and Y move together relative to how much they move individually. ## Product Moment Correlation Formula To calculate the value of precisely, we use formulas involving the **Sum of Squares**: **What are , , and ?** These measure how varied our data is: 1. ** (Sum of Squares for x):** Measures how spread out the x data is from its mean. 2. ** (Sum of Squares for y):** Measures how spread out the y data is from its mean. 3. ** (Sum of Products of deviations for x and y):** Measures how x and y vary _together_. Key: - : Number of data pairs (x, y). - , : Sum of all x and y values. - , : Sum of the squares of each x and y value. - : Sum of the product of each x and y pair. - , : Mean of x and y values. By calculating these three SS values and plugging them into the formula for , we get the Product Moment Correlation Coefficient. ## Interpreting the Value of r Once we have the value of , we can interpret its strength and direction using the following general guidelines: | Value of | Correlation Strength | Description | | :------------------------------------ | :-------------------- | :------------------------------------------------- | | | Perfect Positive | All points lie exactly on an upward sloping line. | | | Strong Positive | Clear and strong positive linear relationship. | | | Moderate Positive | Moderately visible positive linear relationship. | | | Weak Positive | Very low positive linear relationship. | | | No Linear Correlation | No linear relationship at all. | | | Weak Negative | Very low negative linear relationship. | | | Moderate Negative | Moderately visible negative linear relationship. | | | Strong Negative | Clear and strong negative linear relationship. | | | Perfect Negative | All points lie exactly on a downward sloping line. |