Correlation Matrix, Scatter Matrix, and Line Graph with Stock and Inflation Data in Python
I begin with a DataFrame we've been assembling with stock data from Yahoo! finance and inflation data from the Bureau of Labor Statistics.
And plot!
monthlyChange.plot(figsize=(10,7))
plt.legend()
plt.title("Monthly Percent Change")
plt.ylabel("Percent change", fontsize=14)
plt.xlabel('Year', fontsize=14)
plt.grid(which="major", color='k', linestyle="-", linewidth=0.6)
plt.show()
Correlation shows us how much the data are related to each other. Highly correlated will be a value close to 1, and highly negatively correlated will be close to -1. Many times, in supervised learning, we want to pick features that are highly correlated with the target, but we don't want to select features that are highly correlated with other features. Take a look at the correlation values; which stocks tend to be correlated with others? Can we draw conclusions from that?
monthlyChange.corr()
matrix = monthlyChange.corr()
Now, create a correlation graph. Shades of green show correlation.
#plotting correlation matrix
plt.imshow(matrix, cmap='Greens')
#adding colorbar
plt.colorbar()
#extracting variable names
variables = []
for i in matrix.columns:
variables.append(i)
Adding labels to the matrix
plt.xticks(range(len(matrix)), variables, rotation=45, ha='right')
plt.yticks(range(len(matrix)), variables)
Display the plot
plt.show()
The scatter matrix gives us a lot of information. We can see the monthly change of multiple components in a grid. Have a look at the correlation between Key Bank and Huntington Bank, for instance. Are these too closely correlated?
#scatter
from pandas.plotting import scatter_matrix, parallel_coordinates
axes = scatter_matrix(monthlyChange, alpha=0.5, figsize=(6, 6), diagonal='kde')
corr = monthlyChange.corr().values
for i, j in zip(*plt.np.triu_indices_from(axes, k=1)):
axes[i, j].annotate('%.3f' %corr[i,j], (0.8, 0.8),
xycoords='axes fraction', ha='center', va='center')
plt.show()
If we're looking at two columns that have exponential relation, consider scaling with a logarithmic function.
Where a column and row intersect at the same symbol, it gives is a distribution graph. This is the one that looks like a line chart, as opposed to the other graphs that are scatter plots.
Информация по комментариям в разработке