Academic

LaSTR: Language-Driven Time-Series Segment Retrieval

arXiv:2603.00725v1 Announce Type: new Abstract: Effectively searching time-series data is essential for system analysis, but existing methods often require expert-designed similarity criteria or rely on global, series-level descriptions. We study language-driven segment retrieval: given a natural language query, the goal is to retrieve relevant local segments from large time-series repositories. We build large-scale segment--caption training data by applying TV2-based segmentation to LOTSA windows and generating segment descriptions with GPT-5.2, and then train a Conformer-based contrastive retriever in a shared text--time-series embedding space. On a held-out test split, we evaluate single-positive retrieval together with caption-side consistency (SBERT and VLM-as-a-judge) under multiple candidate pool sizes. Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.

Kota Dohi, Harsh Purohit, Tomoya Nishida, Takashi Endo, Yusuke Ohtsubo, Koichiro Yawata, Koki Takeshita, Tatsuya Sasaki, Yohei Kawaguchi · March 4, 2026 · 1 min read · 17 views

#cs.CL

Executive Summary

The article introduces LaSTR, a language-driven time-series segment retrieval system, which effectively searches time-series data using natural language queries. LaSTR is trained on large-scale segment-caption data and utilizes a Conformer-based contrastive retriever in a shared text-time-series embedding space. The system outperforms existing baselines, demonstrating improved ranking quality and semantic agreement between retrieved segments and query intent. LaSTR's approach has significant implications for system analysis and time-series data retrieval.

Key Points

▸ LaSTR uses natural language queries to retrieve relevant time-series segments
▸ The system is trained on large-scale segment-caption data generated using TV2-based segmentation and GPT-5.2
▸ LaSTR outperforms random and CLIP baselines in single-positive retrieval and caption-side consistency evaluations

Merits

Effective Retrieval

LaSTR's language-driven approach enables effective retrieval of relevant time-series segments, improving system analysis and data understanding.

Demerits

Dependence on Training Data

LaSTR's performance may be limited by the quality and availability of large-scale segment-caption training data, which can be time-consuming and resource-intensive to generate.

Expert Commentary

LaSTR represents a significant advancement in time-series data retrieval, leveraging the power of natural language processing and machine learning to improve the efficiency and effectiveness of system analysis. The system's ability to retrieve relevant segments using language queries has far-reaching implications for various domains, from finance to healthcare. However, the dependence on high-quality training data and potential policy issues related to data privacy and security must be carefully addressed to ensure the widespread adoption and responsible use of LaSTR.

Recommendations

✓ Further research is needed to explore the applications and limitations of LaSTR in various domains and to develop strategies for addressing potential policy issues.
✓ The development of LaSTR should be accompanied by the creation of guidelines and regulations for responsible data collection, storage, and analysis to ensure the protection of sensitive information and intellectual property.

Sources

arXiv - cs.CL

LaSTR: Language-Driven Time-Series Segment Retrieval

AI Commentary

Executive Summary

Key Points

Merits

Effective Retrieval

Demerits

Dependence on Training Data

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs