Stock movement clusters
- saman aboutorab
- Jan 3, 2024
- 1 min read
Updated: Jan 8, 2024
In this project, we'll cluster companies using their daily stock price movements (i.e. the dollar difference between the closing and opening prices for each trading day). The NumPy array shows movements of daily price movements from 2010 to 2015 (obtained from Yahoo! Finance), where each row corresponds to a company, and each column corresponds to a trading day.

Import Librariesimport pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
# Import Normalizer
from sklearn.preprocessing import Normalizer
# Import normalize
from sklearn.preprocessing import normalize
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt
# Import TSNE
from sklearn.manifold import TSNE Datasetstock_df = pd.read_csv('company-stock-movements-2010-2015-incl.csv')
stock_df.rename(columns={'Unnamed: 0':'companies'}, inplace=True)
X_stock_df = stock_df.drop(['companies'], axis=1)
movements = X_stock_df.to_numpy() Fit Pipeline# Create a normalizer: normalizer
normalizer = Normalizer()
# Create a KMeans model with 10 clusters: kmeans
kmeans = KMeans(n_clusters=10)
# Make a pipeline chaining normalizer and kmeans: pipeline
pipeline = make_pipeline(normalizer, kmeans)
# Fit pipeline to the daily price movements
pipeline.fit(movements) Predict# Predict the cluster labels: labels
labels = pipeline.predict(movements)
# Create a DataFrame aligning labels and companies: df
companies = stock_df['companies']
df = pd.DataFrame({'labels': labels, 'companies': companies})
# Display df sorted by cluster label
print(df.sort_values('labels')) Hierarchies of stocksmovements_companies = movements[:60]
len(companies) 60 # Normalize the movements: normalized_movements
normalized_movements = normalize(movements_companies)
# Create a TSNE instance: model
model = TSNE(learning_rate=50)
# Apply fit_transform to normalized_movements: tsne_features
tsne_features = model.fit_transform(normalized_movements)
# Select the 0th feature: xs
xs = tsne_features[:,0]
# Select the 1th feature: ys
ys = tsne_features[:,1]
# Scatter plot
plt.scatter(xs, ys, alpha=0.5)
# Annotate the points
for x, y, company in zip(xs, ys, companies):
plt.annotate(company, (x, y), fontsize=5, alpha=0.75)
plt.show() |
Comments