The PM Torch

Cut through the noise -- AI PM updates that matter

/
437 articles ยท last 7 days
arXiv cs.AI15h ago

Context-Enriched Natural Language Descriptions of Vessel Trajectories

arXiv:2603.12287v1 Announce Type: new Abstract: We address the problem of transforming raw vessel trajectory data collected from AIS into structured and semantically enriched representations interpretable by humans and directly usable by machine reasoning systems. We propose a context-aware trajectory abstraction framework that segments noisy AIS sequences into distinct trips each consisting of clean, mobility-annotated episodes. Each episode is further enriched with multi-source contextual...

AI Researchaidataby Kostas Patroumpas, Alexandros Troupiotis-Kapeliaris, Giannis Spiliopoulos, Panagiotis Betchavas, Dimitrios Skoutas, Dimitris Zissis, Nikos Bikakis
arXiv cs.AI15h ago

Efficient Reasoning with Balanced Thinking

arXiv:2603.12372v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as...

AI Researchaiby Yulin Li, Tengyao Tu, Li Ding, Junjie Wang, Huiling Zhen, Yixin Chen, Yong Li, Zhuotao Tian
arXiv cs.AI15h ago

Generating Expressive and Customizable Evals for Timeseries Data Analysis Agents with AgentFuel

arXiv:2603.12483v1 Announce Type: new Abstract: Across many domains (e.g., IoT, observability, telecommunications, cybersecurity), there is an emerging adoption of conversational data analysis agents that enable users to "talk to your data" to extract insights. Such data analysis agents operate on timeseries data models; e.g., measurements from sensors or events monitoring user clicks and actions in product analytics. We evaluate 6 popular data analysis agents (both open-source and proprietary)...

AI Researchaidataby Aadyaa Maddi, Prakhar Naval, Deepti Mande, Shane Duan, Muckai Girish, Vyas Sekar
arXiv cs.AI15h ago

AI Planning Framework for LLM-Based Web Agents

arXiv:2603.12710v1 Announce Type: new Abstract: Developing autonomous agents for web-based tasks is a core challenge in AI. While Large Language Model (LLM) agents can interpret complex user requests, they often operate as black boxes, making it difficult to diagnose why they fail or how they plan. This paper addresses this gap by formally treating web tasks as sequential decision-making processes. We introduce a taxonomy that maps modern agent architectures to traditional planning paradigms:...

AI Researchaillmby Orit Shahnovsky, Rotem Dror
arXiv cs.AI15h ago

On Using Machine Learning to Early Detect Catastrophic Failures in Marine Diesel Engines

arXiv:2603.12733v1 Announce Type: new Abstract: Catastrophic failures of marine engines imply severe loss of functionality and destroy or damage the systems irreversibly. Being sudden and often unpredictable events, they pose a severe threat to navigation, crew, and passengers. The abrupt nature makes early detection the only effective countermeasure. However, research has concentrated on modeling the gradual degradation of components, with limited attention to sudden and anomalous phenomena....

AI Researchaimachine learningby Francesco Maione, Paolo Lino, Giuseppe Giannino, Guido Maione
arXiv cs.AI15h ago

ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning

arXiv:2603.12740v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly applied to complex, multi-step tasks that require interaction with diverse external tools across various domains. However, current LLM agent tool planning methods typically rely on greedy, reactive tool selection strategies that lack foresight and fail to account for inter-tool dependencies. In this paper, we present ToolTree, a novel Monte Carlo tree search-inspired planning paradigm for tool...

AI Researchaillmby Shuo Yang, Soyeon Caren Han, Yihao Ding, Shuhe Wang, Eduard Hoy
arXiv cs.AI15h ago

AI Model Modulation with Logits Redistribution

arXiv:2603.12755v1 Announce Type: new Abstract: Large-scale models are typically adapted to meet the diverse requirements of model owners and users. However, maintaining multiple specialized versions of the model is inefficient. In response, we propose AIM, a novel model modulation paradigm that enables a single model to exhibit diverse behaviors to meet the specific end requirements. AIM enables two key modulation modes: utility and focus modulations. The former provides model owners with...

AI Researchaiby Zihan Wang, Zhongkui Ma, Xinguo Feng, Zhiyang Mei, Ethan Ma, Derui Wang, Minhui Xue, Guangdong Bai
arXiv cs.AI15h ago

Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations

arXiv:2603.12813v1 Announce Type: new Abstract: Agentic AI systems integrating large language models (LLMs) with reasoning and tooluse capabilities are transforming various domains - in particular, software development. In contrast, their application in chemical process flowsheet modelling remains largely unexplored. In this work, we present an agentic AI framework that delivers assistance in an industrial flowsheet simulation environment. To this end, we show the capabilities of GitHub Copilot...

AI Toolsaillmby Pascal Sch\"afer, Lukas J. Krinke, Martin Wlotzka, Norbert Asprion
arXiv cs.AI15h ago

ODRL Policy Comparison Through Normalisation

arXiv:2603.12926v1 Announce Type: new Abstract: The ODRL language has become the standard for representing policies and regulations for digital rights. However its complexity is a barrier to its usage, which has caused many related theoretical and practical works to focus on different, and not interoperable, fragments of ODRL. Moreover, semantically equivalent policies can be expressed in numerous different ways, which makes comparing them and processing them harder. Building on top of a...

Industry Newsragregulationby Jaime Osvaldo Salas, Paolo Pareti, George Konstantinidis
arXiv cs.AI15h ago

Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization

arXiv:2603.12933v1 Announce Type: new Abstract: Large Language Model (LLM)-driven Multi-Agent Systems (MAS) have demonstrated strong capability in complex reasoning and tool use, and heterogeneous agent pools further broaden the quality--cost trade-off space. Despite these advances, real-world deployment is often constrained by high inference cost, latency, and limited transparency, which hinders scalable and efficient routing. Existing routing strategies typically rely on expensive LLM-based...

AI Researchaillmby Xudong Wang, Chaoning Zhang, Jiaquan Zhang, Chenghao Li, Qigan Sun, Sung-Ho Bae, Peng Wang, Ning Xie, Jie Zou, Yang Yang, Hengtao Shen
arXiv cs.AI15h ago

Structured Distillation for Personalized Agent Memory: 11x Token Reduction with Retrieval Preservation

arXiv:2603.13017v1 Announce Type: new Abstract: Long conversations with an AI agent create a simple problem for one user: the history is useful, but carrying it verbatim is expensive. We study personalized agent memory: one user's conversation history with an agent, distilled into a compact retrieval layer for later search. Each exchange is compressed into a compound object with four fields (exchange_core, specific_context, thematic room_assignments, and regex-extracted files_touched). The...

AI Researchaiai agentby Sydney Lewis
arXiv cs.AI15h ago

Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation

arXiv:2603.13099v1 Announce Type: new Abstract: We introduce **CRYSTAL** (*__C__lear __R__easoning via __Y__ielded __S__teps, __T__raceability and __L__ogic*), a diagnostic benchmark with 6,372 instances that evaluates multimodal reasoning through verifiable intermediate steps. We propose two complementary metrics: *Match F1*, which scores step-level precision and recall via semantic similarity matching, and *Ordered Match F1*, which further penalizes disordered reasoning chains. References are...

AI Researchaiby Wayner Barrios, SouYoung Jin
arXiv cs.AI15h ago

Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation

arXiv:2603.13131v1 Announce Type: new Abstract: Open-world embodied agents must solve long-horizon tasks where the main bottleneck is not single-step planning quality but how interaction experience is organized and evolved. To this end, we present Steve-Evolving, a non-parametric self-evolving framework that tightly couples fine-grained execution diagnosis with dual-track knowledge distillation in a closed loop. The method follows three phases: Experience Anchoring, Experience Distillation, and...

Practical Guidesaiby Zhengwei Xie, Zhisheng Chen, Ziyan Weng, Tingyu Wu, Chenglong Li, Vireo Zhang, Kun Wang
arXiv cs.AI15h ago

When Right Meets Wrong: Bilateral Context Conditioning with Reward-Confidence Correction for GRPO

arXiv:2603.13134v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) has emerged as an effective method for training reasoning models. While it computes advantages based on group mean, GRPO treats each output as an independent sample during the optimization and overlooks a vital structural signal: the natural contrast between correct and incorrect solutions within the same group, thus ignoring the rich, comparative data that could be leveraged by explicitly pitting...

AI Researchairagdataby Yu Li, Tian Lan, Zhengling Qi
arXiv cs.AI15h ago

Developing and evaluating a chatbot to support maternal health care

arXiv:2603.13168v1 Announce Type: new Abstract: The ability to provide trustworthy maternal health information using phone-based chatbots can have a significant impact, particularly in low-resource settings where users have low health literacy and limited access to care. However, deploying such systems is technically challenging: user queries are short, underspecified, and code-mixed across languages, answers require regional context-specific grounding, and partial or missing symptom context...

AI Researchby Smriti Jha, Vidhi Jain, Jianyu Xu, Grace Liu, Sowmya Ramesh, Jitender Nagpal, Gretchen Chapman, Benjamin Bellows, Siddhartha Goyal, Aarti Singh, Bryan Wilder
Hacker News17h ago

Pulsed High-Power Radio Energy Can Cause Harmful Effects on the Brain (2024)

Industry Newsaiby greesil
Hacker News18h ago

How I write software with LLMs

AI Toolsllmby indigodaddy
Hacker News18h ago

Quillx is an open standard for disclosing AI involvement in software projects

AI Toolsaiby qainsights
TechCrunch AI18h ago

Google, Accel India accelerator chooses 5 startups and none are ‘AI wrappers’

Google and Accel say about 70% of AI startup pitches tied to India were "wrappers" as they reviewed more than 4,000 applications for their Atoms cohort.

AI Toolsaigooglestartupby Jagmeet Singh
Hacker News19h ago

Cannabinoids remove plaque-forming Alzheimer's proteins from brain cells (2016)

Industry Newsaiby anjel