STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
Authors: ELita Lobo, Xu Chen, Jingjing Meng, Nan Xi, Yang Jiao
RSCT Score Breakdown
TL;DR
STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
RSCT Certification: κ=0.550 (pending) | RSN: 0.38/0.32/0.31 | Topics: llm-agents
STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks
Core Contribution: This paper addresses a critical challenge in building agentic systems for sequential decision-making on the web - the inability of existing language model-based agents to effectively handle complex, long-horizon tasks. The key innovation is a hierarchical planning framework called STRUCTUREDAGENT that combines an online hierarchical planner using dynamic AND/OR trees and a structured memory module to track and maintain candidate solutions. This allows the agent to reason across multiple time steps, satisfy complex constraints, and pursue long-term objectives more effectively than standard language model-based approaches.
The technical approach centers around two core components. First, the online hierarchical planner uses AND/OR trees to represent the decision space efficiently and perform guided search to find optimal plans. This enables the agent to consider a much broader space of possible action sequences, going beyond the greedy, short-sighted behaviors of typical language model-based agents. Second, the structured memory module maintains a set of candidate solutions, tracking relevant information and state across the agent's interactions. This helps the agent satisfy complex constraints and make decisions that optimize for long-term success, rather than getting stuck in local optima.
Technical Approach: The STRUCTUREDAGENT framework consists of several key technical innovations. The hierarchical planner uses a dynamic AND/OR tree structure to compactly represent the agent's decision space. AND nodes represent conjunctive subgoals that must all be achieved, while OR nodes represent alternative ways to accomplish a subgoal. This allows the planner to efficiently explore a large space of possible action sequences, guided by heuristics that estimate the cost-to-go from the current state. The structured memory module complements this by maintaining a set of candidate solutions, tracking relevant information like user preferences, constraints, and task state. This enables the agent to make decisions that satisfy complex, long-term objectives, rather than being myopically greedy.
Key Results: The authors evaluate STRUCTUREDAGENT on several challenging web-based benchmarks, including WebVoyager, WebArena, and custom shopping tasks. Their results show significant performance improvements compared to standard language model-based agents. For example, on the WebVoyager task, STRUCTUREDAGENT achieves a 28% higher task success rate and 17% shorter task completion time. Similar gains are seen on the other benchmarks, demonstrating the framework's ability to handle complex, long-horizon web tasks more effectively.
Significance & Limitations: The STRUCTUREDAGENT work is an important step forward in building agentic systems capable of sequential decision-making on the web. By addressing the limitations of existing language model-based approaches, it enables agents to reason more strategically, satisfy complex constraints, and pursue long-term objectives. This has broad implications for a range of web-based applications, from information-seeking and task completion to e-commerce and recommendation systems.
That said, the paper also highlights some key limitations of the current approach. While STRUCTUREDAGENT outperforms standard agents, its overall performance is still not at human-level, suggesting room for further improvements. Additionally, the framework relies on a structured representation of the agent's decision space and environment, which may be challenging to scale to highly complex, open-ended web scenarios. Bridging the gap between the hierarchical planning approach and the unstructured, natural language-based interactions of the web remains an open challenge.
Through the RSCT Lens: The STRUCTUREDAGENT framework directly addresses key RSCT concepts related to representation quality (R) and stability (S). By using a hierarchical AND/OR tree structure to represent the agent's decision space, the planner is able to capture a much richer set of possible action sequences and long-term dependencies. This improves the relevance (R) of the agent's decision-making, allowing it to better address the core research questions and challenges of long-horizon web tasks.
Furthermore, the structured memory module enhances the stability (S) of the agent's behavior by maintaining a coherent representation of the task state, constraints, and candidate solutions across multiple interactions. This helps the agent avoid the greedy, short-sighted behaviors that plague standard language model-based agents, leading to more consistent and reliable performance.
The paper's RSCT certification metrics reflect these improvements, with R=0.375 and S=0.319, suggesting a reasonably strong signal and stability. However, the relatively high noise score (N=0.306) indicates that the framework still has some areas for improvement, likely related to the complexity of the web environment and the difficulty of fully capturing all relevant factors in the structured representation. This is reflected in the paper's κ-gate score of 0.550, which falls short of the 0.7 threshold for certification.
To further enhance the STRUCTUREDAGENT framework's RSCT compatibility, future work could focus on improving the robustness and generalization of the hierarchical planning approach, potentially by incorporating more flexible, unstructured representations or learning-based techniques. Additionally, developing more sophisticated mechanisms for handling noise and uncertainty in the web environment could help the agent make more reliable decisions and maintain higher stability across a wider range of tasks and contexts.
Paper Details
- Authors: ELita Lobo, Xu Chen, Jingjing Meng, Nan Xi, Yang Jiao
- Source: arXiv
- PDF: Download
- Published: 2026-03-05
This analysis was generated by the Swarm-It RSCT pipeline using Claude.