Stock movement clusters

saman aboutorab
Jan 3, 2024
1 min read

Updated: Jan 8, 2024

In this project, we'll cluster companies using their daily stock price movements (i.e. the dollar difference between the closing and opening prices for each trading day). The NumPy array shows movements of daily price movements from 2010 to 2015 (obtained from Yahoo! Finance), where each row corresponds to a company, and each column corresponds to a trading day.

Import Libraries

import pandas as pd

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

# Import Normalizer
from sklearn.preprocessing import Normalizer
# Import normalize
from sklearn.preprocessing import normalize

from scipy.cluster.hierarchy import linkage, dendrogram

import matplotlib.pyplot as plt

# Import TSNE
from sklearn.manifold import TSNE

Dataset

stock_df = pd.read_csv('company-stock-movements-2010-2015-incl.csv')

stock_df.rename(columns={'Unnamed: 0':'companies'}, inplace=True)

X_stock_df = stock_df.drop(['companies'], axis=1)

movements = X_stock_df.to_numpy()

Fit Pipeline

# Create a normalizer: normalizer
normalizer = Normalizer()

# Create a KMeans model with 10 clusters: kmeans
kmeans = KMeans(n_clusters=10)

# Make a pipeline chaining normalizer and kmeans: pipeline
pipeline = make_pipeline(normalizer, kmeans)

# Fit pipeline to the daily price movements
pipeline.fit(movements)

Predict

# Predict the cluster labels: labels
labels = pipeline.predict(movements)

# Create a DataFrame aligning labels and companies: df
companies = stock_df['companies']
df = pd.DataFrame({'labels': labels, 'companies': companies})

# Display df sorted by cluster label
print(df.sort_values('labels'))

Hierarchies of stocks

movements_companies = movements[:60]
len(companies)

# Normalize the movements: normalized_movements
normalized_movements = normalize(movements_companies)

# Create a TSNE instance: model
model = TSNE(learning_rate=50)

# Apply fit_transform to normalized_movements: tsne_features
tsne_features = model.fit_transform(normalized_movements)

# Select the 0th feature: xs
xs = tsne_features[:,0]

# Select the 1th feature: ys
ys = tsne_features[:,1]

# Scatter plot
plt.scatter(xs, ys, alpha=0.5)

# Annotate the points
for x, y, company in zip(xs, ys, companies):
    plt.annotate(company, (x, y), fontsize=5, alpha=0.75)
plt.show()

Reference:

https://app.datacamp.com/