Summarise:
- ‘diabetes‘ and ‘obesity‘ datasets are joined using inner join on the ‘FIPS’ column, creating a new data frame named ‘obesity_diabetes.’ This merge combines data related to diabetes and obesity based on a common identifier.
- Histograms are plotted to visualize the relationship between the ‘%DIABETIC’ and ‘% OBESE’ columns within the ‘obesity_diabetes‘ data frame to provide insights about the distributions.
- Distribution of “% DIABETIC”:
- The provided statistics (mean, median, mode, skew, and kurtosis) on the plot “Distribution of “% DIABETIC“ is relatively close to being normally distributed.
- Mean (7.14), Median (7.0), and Mode (6.90) are quite close to each other, which is a positive sign for normality.
- Skewness (0.09) indicates that it is a slight right-skewed distribution, but it is very close to zero. whereas in normal distribution skewness is zero. There may be some outliers on the right side of the distribution.
- Kurtosis (2.76) indicates that the distribution has heavier tails. Kurtosis of the normal distribution is 3.0. The distribution has more extreme values of outliers than a normal distribution, and we can see a small peakedness near the skewed region (right side).
- Distribution of “% OBESE”:
- Even though mean, median, and mode are close to each other, skewness and kurtosis give a clearer view of the distribution.
- Skewness (-2.69) indicates that it is a left-skewed distribution. The tails on the left side are longer than the right side suggesting that there are so many outliers on the left side of the distribution.
- Kurtosis (12.32) is an extremely high value, much higher than the kurtosis of normal distribution i.e., 3. It suggests that the distribution has heavier tails compared to the normal distribution which indicates that the distribution has many extreme values, i.e., outliers.