In a market where prices can shift in seconds, forecasting stock movements requires more than historical data — it demands insight into how public sentiment, technical trends, and volatility converge. This project tackles that challenge head-on, combining natural language processing (NLP) and technical analysis indicators to predict Tesla's stock performance with precision and interpretability.
Built entirely with open-source tools in Python, the system pulls real-time market data and recent news headlines to construct a comprehensive feature set. By engineering features such as RSI, MACD, sentiment deltas, and cumulative returns, and by modeling them through ensemble machine learning algorithms, we create a hybrid prediction framework that outputs both the next day’s price and its directional movement.
The focus of this project is to understand how investor sentiment, extracted from recent Tesla-related news headlines, correlates and interacts with technical indicators such as momentum and volatility, and how this combined data can help us predict future stock behavior. Specifically, we ask:
How do short-term public emotions and headlines affect Tesla’s price behavior?
Can momentum signals such as RSI and MACD reinforce sentiment indicators to provide a clearer picture of upcoming trends?
What role do volume shifts and volatility play in mediating these effects?
Ultimately, this investigation aims to train and evaluate both classification and regression models that answer two key questions:
Classification: Will Tesla’s stock price go up or down tomorrow?
Regression: What will be the actual closing price?
yfinance Fetching historical stock data
requests + NewsAPI Extracting real-time news headlines related to Tesla
nltk.punkt Sentence tokenization for text preprocessing
TextBlob, VADER Sentiment analysis of news headlines
pandas, numpy Data manipulation and preprocessing
scikit-learn Machine learning models and evaluation
xgboost Gradient boosting classifier
matplotlib, seaborn, plotly Data visualization and model diagnostics
wordcloud Visual representation of frequent terms
Data Flow and Processing
We retrieved 200 days of TSLA price data using yfinance, focusing on columns such as Open, High, Low, Close, Volume, from which we derived additional features:
Daily returns, calculated as the percentage difference in closing prices across days.
Rolling volatility (Vol_5), as a 5-day rolling standard deviation of returns.
Cumulative returns, capturing long-term performance momentum.
These indicators allowed us to assess not just market direction but also market strength and risk on a rolling basis.
To capture public sentiment, we pulled Tesla headlines from NewsAPI (limited to 30 days) and preprocessed them using:
NLTK's punkt tokenizer
VADER for rule-based compound polarity scoring,
and TextBlob for additional polarity checks.
Daily headlines were grouped by date, and average sentiment scores were calculated, then lagged and differenced to capture momentum in sentiment over time. These were later merged with stock data to form a unified modeling dataset.
Key engineered variables included:
Avg_Sentiment, Sentiment_Change, and Lagged_Sentiment
Volume_Change, Price_Change, Price_Pct_Change
Vol_5, Return, Cum_Return
RSI_14, MACD, MACD_Signal
Targets: Target_Class (up/down) and Target_Price (numeric close)
1.Sentiment as a Leading Indicator
We investigate how positive or negative tone in headlines influences daily stock returns. We use TextBlob and VADER, two well-established natural language processing tools, to extract polarity scores from Tesla-related news headlines published in the last 30 days
2.Technical Signals
We combine sentiment data with:
Daily returns
Volume changes
Lagged percentage movements
Rolling volatility
Cumulative returns
This multi-angle feature engineering allows the model to learn not just what the market is doing, but why.
3. Dual-Layered Prediction Models
We use two prediction targets:
Classification: Will the stock go up or down tomorrow?
Regression: What will the closing price be?
Each task has its own dedicated pipeline, trained and optimized using GridSearchCV and ensemble learners
We analyzed sentiment polarity from over 200 Tesla-related headlines using:
VADER for rule-based sentiment scoring
TextBlob for secondary validation of tone
These scores were averaged per day and then:
Lagged to account for delayed market reaction
Differenced to observe sharp changes
Combined with volatility and price return features
We calculated a 14-day RSI to identify momentum extremes. RSI values near 70 suggested overbought conditions, while values near 30 indicated oversold environments. These insights often preempted trend reversals.
We calculated the MACD as the difference between 12-day and 26-day exponential moving averages, along with a 9-day signal line. Their crossover points frequently aligned with bullish or bearish pivots.
To give a complete view of Tesla’s market behavior, we used an interactive candlestick chart showing the Open, High, Low, and Close (OHLC) values for each trading day. This classic financial chart allows for precise reading of market movement within single days and across trends.
We overlaid 3-day, 5-day, and 7-day simple moving averages on the closing price of TSLA to visualize short-term trends and their convergence/divergence.
We generated a word cloud from all Tesla news headlines to extract dominant themes driving public discourse.
To explore internal relationships among numeric indicators, we generated a Seaborn pairplot of key financial features: Close, Volume, Daily_Return, and Volatility.
We constructed a two-row subplot showing daily return and 5-day rolling volatility side by side. This allowed us to observe how sharp return changes often led to sustained volatility over time.
Another subplot visualized cumulative return and daily volume change, making it easy to identify whether market moves were backed by volume or occurred in low-liquidity conditions.
To uncover relationships between the engineered features used in the model, we generated a correlation matrix heatmap that visualizes the strength and direction of pairwise correlations among key numeric and text-derived variables. This matrix includes traditional indicators like High, Close_outlier, and MA3, as well as text-derived TF-IDF features from the headline corpus such as TFIDF_doge,TFIDF_model, etc.,
The heatmap reveals several important insights:
Diagonal dominance of yellow blocks (value = 1.0) confirms self-correlation, validating the metric structure.
Moderate to strong correlations appear between news count (Nw_Count) and normalized sentiment (Nw_Norm_Sent), suggesting days with more headlines also tend to exhibit more sentiment variability.
Some TF-IDF tokens such as "elon", "model", and "company" show mild clustering effects, indicating co-occurrence or thematic alignment in news narratives.
Traditional stock indicators like High and MA3 remain largely uncorrelated with textual features, underscoring the complementary nature of news sentiment in the modeling process.
This visualization served as a sanity check and feature refinement tool, ensuring that no multicollinearity issues would distort model performance and confirming that our sentiment metrics contribute orthogonal, non-redundant signals to the prediction task.
Classification Models – Will the stock Go Up or Down?
RandomForestClassifier Ensemble learning using multiple decision trees
SVC (Support Vector) Kernel-based margin optimization
XGBClassifier Boosted decision tree with gradient optimization
VotingClassifier Combines predictions from all models for robustness
Each model was trained using a time-series cross-validation approach and tuned with GridSearchCV
To interpret which features influenced model decisions the most, we extracted feature importances from XGBoost models. Key contributors included:
Sentiment_Change, Lagged_Sentiment
MACD, RSI_14
Price_Pct_Change, Vol_5
Regression Model – What Will the Price Be?
RandomForestRegressor - Non-linear, ensemble-based regression
Tested on:
R² Score
MAE (Mean Absolute Error)
RMSE (Root Mean Square Error)
📈 Model Insights
Sentiment change and lagged volatility were strong predictors for price movement.
MACD crossovers aligned closely with high-accuracy classification events.
RSI extremes (near 30/70) correctly predicted reversals in over 60% of test samples.
Key Takeaways
Combining textual sentiment with technical signals improves both classification and regression accuracy.
MACD and RSI aren’t just technical decor — they interact meaningfully with sentiment to shape market behavior.
The model does not rely on deep learning, but leverages interpretable, tunable ensemble methods.
Real-World Use Cases
Investor Advisory Tools — Use sentiment + RSI/MACD to offer short-term movement signals
Financial NLP Dashboards — Visualizing how news cycles map to technical market states
Academic Research — Benchmarking sentiment-enhanced price modeling frameworks
What This Project Demonstrates
Full-stack integration of market data + news sentiment
Engineering of time-series + textual features
Deep use of financial technical indicators
Visualization-driven interpretation of model behavior
Clear path to extension: multi-asset, intraday, or social media sentiment
Source Code : GITHUB
References :