Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting
arXiv:2603.09085v1 Announce Type: new Abstract: By capturing the prevailing sentiment and market mood, textual data has become increasingly vital for forecasting commodity prices, particularly in metal markets. However, the effectiveness of lightweight, finetuned large language models (LLMs) in extracting predictive signals for aluminum prices, and the specific market conditions under which these signals are most informative, remains under-explored. This study generates monthly sentiment scores from English and Chinese news headlines (Reuters, Dow Jones Newswires, and China News Service) and integrates them with traditional tabular data, including base metal indices, exchange rates, inflation rates, and energy prices. We evaluate the predictive performance and economic utility of these models through long-short simulations on the Shanghai Metal Exchange from 2007 to 2024. Our results demonstrate that during periods of high volatility, Long Short-Term Memory (LSTM) models incorporating
arXiv:2603.09085v1 Announce Type: new Abstract: By capturing the prevailing sentiment and market mood, textual data has become increasingly vital for forecasting commodity prices, particularly in metal markets. However, the effectiveness of lightweight, finetuned large language models (LLMs) in extracting predictive signals for aluminum prices, and the specific market conditions under which these signals are most informative, remains under-explored. This study generates monthly sentiment scores from English and Chinese news headlines (Reuters, Dow Jones Newswires, and China News Service) and integrates them with traditional tabular data, including base metal indices, exchange rates, inflation rates, and energy prices. We evaluate the predictive performance and economic utility of these models through long-short simulations on the Shanghai Metal Exchange from 2007 to 2024. Our results demonstrate that during periods of high volatility, Long Short-Term Memory (LSTM) models incorporating sentiment data from a finetuned Qwen3 model (Sharpe ratio 1.04) significantly outperform baseline models using tabular data alone (Sharpe ratio 0.23). Subsequent analysis elucidates the nuanced roles of news sources, topics, and event types in aluminum price forecasting.
Executive Summary
This study explores the effectiveness of finetuned large language models (LLMs) in extracting predictive signals for aluminum prices from news data, integrating text with traditional tabular data. The researchers evaluate the predictive performance of these models through long-short simulations on the Shanghai Metal Exchange from 2007 to 2024. The results demonstrate that LSTM models incorporating sentiment data from a finetuned Qwen3 model significantly outperform baseline models using tabular data alone, particularly during periods of high volatility. The study also elucidates the nuanced roles of news sources, topics, and event types in aluminum price forecasting. The findings have significant implications for investors and policymakers seeking to improve commodity price forecasting and market analysis.
Key Points
- ▸ Finetuned LLMs can extract predictive signals for aluminum prices from news data.
- ▸ LSTM models incorporating sentiment data outperform baseline models during periods of high volatility.
- ▸ News sources, topics, and event types play nuanced roles in aluminum price forecasting.
Merits
Strength in Predictive Performance
The study demonstrates the ability of finetuned LLMs to improve predictive performance, particularly during periods of high volatility, highlighting the potential of these models in commodity price forecasting.
Insights into News Data
The study provides valuable insights into the roles of news sources, topics, and event types in aluminum price forecasting, shedding light on the complexities of news data and its potential applications in market analysis.
Demerits
Limited Dataset
The study relies on a dataset spanning from 2007 to 2024, which may not capture long-term trends or structural changes in the market, potentially limiting the generalizability of the findings.
Dependence on Specific LLM
The study's results are contingent on the performance of the finetuned Qwen3 model, which may not generalize to other LLMs or datasets, highlighting the need for further research on the robustness and transferability of these models.
Expert Commentary
The study's findings are significant, as they demonstrate the potential of finetuned LLMs to improve predictive performance in commodity price forecasting. However, the results are contingent on the performance of a specific LLM, and the study's reliance on a limited dataset may limit the generalizability of the findings. Additionally, the study's focus on aluminum prices may not generalize to other commodities or markets. Nevertheless, the study's insights into the roles of news sources, topics, and event types in aluminum price forecasting are valuable and have significant implications for investors and policymakers seeking to improve market analysis and forecasting.
Recommendations
- ✓ Future studies should investigate the robustness and transferability of finetuned LLMs across different datasets and commodities.
- ✓ Researchers should explore the potential applications of natural language processing in market analysis and forecasting, including text analysis and opinion mining.