LaSTR: Language-Driven Time-Series Segment Retrieval
arXiv:2603.00725v1 Announce Type: new Abstract: Effectively searching time-series data is essential for system analysis, but existing methods often require expert-designed similarity criteria or rely on global, series-level descriptions. We study language-driven segment retrieval: given a natural language query, the goal is to retrieve relevant local segments from large time-series repositories. We build large-scale segment--caption training data by applying TV2-based segmentation to LOTSA windows and generating segment descriptions with GPT-5.2, and then train a Conformer-based contrastive retriever in a shared text--time-series embedding space. On a held-out test split, we evaluate single-positive retrieval together with caption-side consistency (SBERT and VLM-as-a-judge) under multiple candidate pool sizes. Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.
arXiv:2603.00725v1 Announce Type: new Abstract: Effectively searching time-series data is essential for system analysis, but existing methods often require expert-designed similarity criteria or rely on global, series-level descriptions. We study language-driven segment retrieval: given a natural language query, the goal is to retrieve relevant local segments from large time-series repositories. We build large-scale segment--caption training data by applying TV2-based segmentation to LOTSA windows and generating segment descriptions with GPT-5.2, and then train a Conformer-based contrastive retriever in a shared text--time-series embedding space. On a held-out test split, we evaluate single-positive retrieval together with caption-side consistency (SBERT and VLM-as-a-judge) under multiple candidate pool sizes. Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.
Executive Summary
The article introduces LaSTR, a language-driven time-series segment retrieval system, which effectively searches time-series data using natural language queries. LaSTR is trained on large-scale segment-caption data and utilizes a Conformer-based contrastive retriever in a shared text-time-series embedding space. The system outperforms existing baselines, demonstrating improved ranking quality and semantic agreement between retrieved segments and query intent. LaSTR's approach has significant implications for system analysis and time-series data retrieval.
Key Points
- ▸ LaSTR uses natural language queries to retrieve relevant time-series segments
- ▸ The system is trained on large-scale segment-caption data generated using TV2-based segmentation and GPT-5.2
- ▸ LaSTR outperforms random and CLIP baselines in single-positive retrieval and caption-side consistency evaluations
Merits
Effective Retrieval
LaSTR's language-driven approach enables effective retrieval of relevant time-series segments, improving system analysis and data understanding.
Demerits
Dependence on Training Data
LaSTR's performance may be limited by the quality and availability of large-scale segment-caption training data, which can be time-consuming and resource-intensive to generate.
Expert Commentary
LaSTR represents a significant advancement in time-series data retrieval, leveraging the power of natural language processing and machine learning to improve the efficiency and effectiveness of system analysis. The system's ability to retrieve relevant segments using language queries has far-reaching implications for various domains, from finance to healthcare. However, the dependence on high-quality training data and potential policy issues related to data privacy and security must be carefully addressed to ensure the widespread adoption and responsible use of LaSTR.
Recommendations
- ✓ Further research is needed to explore the applications and limitations of LaSTR in various domains and to develop strategies for addressing potential policy issues.
- ✓ The development of LaSTR should be accompanied by the creation of guidelines and regulations for responsible data collection, storage, and analysis to ensure the protection of sensitive information and intellectual property.