Workflow Configuration¶

Step control, security settings, follow-up question generation, and performance optimizations.

flowchart LR
    A[Question] --> B[PII Detection]
    B --> C[Parse Question]
    C --> D[Get Unique Nouns]
    D --> E[Generate SQL]
    E --> F[Validate & Fix SQL]
    F --> G[Execute SQL]
    G --> H[Format Results]
    H --> I[Choose & Format Viz]
    I --> J[Follow-up Questions]
    J --> K[Response]
    style A fill:#2F5496,color:#fff
    style B fill:#4CAF50,color:#fff
    style C fill:#0288D1,color:#fff
    style D fill:#4CAF50,color:#fff
    style E fill:#0288D1,color:#fff
    style F fill:#0288D1,color:#fff
    style G fill:#4CAF50,color:#fff
    style H fill:#0288D1,color:#fff
    style I fill:#0288D1,color:#fff
    style J fill:#0288D1,color:#fff
    style K fill:#7B1FA2,color:#fff

Each step above can be individually enabled or disabled via the steps configuration:

workflow:
  # Basic workflow settings
  max_retries: 3                            # SQL error retry attempts
  timeout_per_step: 120                     # Timeout per workflow step (seconds)
  output_format: "json"                     # Output format preference
  include_metadata: true                    # Include metadata in responses
  include_query_info: true                  # Include query information

  # Step control
  steps:
    parse_question: true
    get_unique_nouns: true
    generate_sql: true
    validate_and_fix_sql: true
    execute_sql: true
    format_results: true
    choose_visualization: true
    format_data_for_visualization: true
    generate_followup_questions: true     # Generate contextual follow-up questions

  # Input Validation (New in v0.2.1)
  input_validation:
    max_question_length: 10000              # Maximum question length
    blocked_substrings:                     # Block potentially harmful content
      - "<script"
      - "javascript:"
      - "data:"
      - "vbscript:"
      - "@@"

  # Parse Overrides (New in v0.2.1)
  parse_overrides:                          # Short-circuit parsing for special cases
    - enabled: true
      match_any_keywords:                   # Keywords to match
        - "survey"
        - "specific_dataset"
      parsed_response:                      # Pre-defined response
        is_relevant: true
        relevant_tables:
          - table_name: "project.dataset.table"
            relevance_reason: "Explicitly mentioned table"
            noun_columns: ["id", "name"]
            numeric_columns: ["count"]
        question_type: "analysis"
        analysis_type: "data_exploration"

  # SQL Safety (New in v0.2.1)
  sql_safety:
    allowed_query_types: ["SELECT", "WITH"] # Only allow safe query types
    forbidden_patterns:                     # Block dangerous SQL patterns
      - "DROP"
      - "DELETE"
      - "TRUNCATE"
      - "ALTER"
      - "CREATE"
      - "INSERT"
      - "UPDATE"
      - "GRANT"
      - "REVOKE"
      - "EXEC"
      - "EXECUTE"
    suspicious_functions:                   # Block suspicious SQL functions
      - "OPENROWSET"
      - "OPENDATASOURCE"
      - "XP_"
      - "SP_"
      - "DBMS_"
      - "UTL_FILE"
      - "UTL_HTTP"
      - "BULK"
      - "OUTFILE"
      - "DUMPFILE"
    max_sql_length: 50000                   # Maximum SQL query length

  # Conversation Context (New in v0.2.1)
  conversation_context:
    max_history_messages: 6                 # Maximum messages to keep in context

Enhanced Security Features¶

Input Validation: - Protects against injection attacks and malformed input - Configurable content filtering and length limits - Customizable blocked substring patterns

SQL Safety: - Multi-layer SQL injection protection - Configurable allowed query types and forbidden patterns - Detection of suspicious functions and operations - Query length limits to prevent resource exhaustion

Parse Overrides: - Bypass standard parsing for specific use cases - Pre-defined responses for known keywords or patterns - Improved performance for common queries

Conversation Context: - Intelligent conversation history management - Configurable context window size - Optimized for token efficiency in LLM prompts

Follow-up Question Generation¶

The follow-up question generation feature provides AI-powered contextual questions to help users explore their data more deeply after receiving initial query results.

Key Benefits¶

🧠 Contextual Intelligence: Generates questions based on actual query results and context
🔄 Exploration Guidance: Suggests natural next steps for data exploration
⚡ Smart Fallbacks: Uses rule-based generation when LLM is unavailable
🎯 Selective Generation: Only generates meaningful questions, returns empty list when appropriate

Configuration¶

workflow:
  steps:
    generate_followup_questions: true         # Enable follow-up question generation

prompts:
  generate_followup_questions:
    system: |
      You are an AI assistant that generates relevant follow-up questions based on a user's database query and results.
      Your goal is to suggest 2-3 questions that would provide additional insights, help users explore the data further,
      or uncover related information they might find valuable.

      Guidelines:
      - Generate 2-3 concise, specific questions
      - Focus on actionable insights and deeper analysis
      - Consider drill-down opportunities (time periods, categories, segments)
      - Suggest comparative analysis when appropriate
      - Avoid generic questions - be specific to the data and context
      - Return questions as a simple numbered list

      Example good follow-up questions:
      - "What are the month-over-month trends for these top categories?"
      - "How do these numbers compare to the same period last year?"
      - "Which specific subcategories drive the highest volume in Customer Service?"

    human: |
      Original question: {question}
      Answer provided: {answer}
      SQL query executed: {sql_query}
      Data summary: {results_summary}
      Context: {context_info}
      Number of result rows: {row_count}

      Based on this information, generate 2-3 relevant follow-up questions that would provide additional insights:

How It Works¶

Execution Sequence: Runs after format_results to ensure it has access to the formatted answer
Context Analysis: Analyzes the original question, generated answer, SQL query, and result data
Intelligent Generation:
Primary: Uses LLM with contextual prompts for high-quality questions
Fallback: Uses rule-based logic when LLM is unavailable
Smart Skip: Returns empty list when no meaningful questions can be generated

Rule-Based Fallback Logic¶

When LLM is unavailable, the system uses intelligent rule-based generation:

GROUP BY + COUNT queries: Suggests trend analysis and comparative questions
Category-based queries: Suggests subcategory drill-downs and effectiveness comparisons
Date-based queries: Suggests seasonal patterns and time-period comparisons
Aggregation queries: Suggests distribution analysis and factor exploration

Best Practices¶

Prompt Configuration:

# Focus on specific, actionable questions
prompts:
  generate_followup_questions:
    system: |
      Generate specific, actionable follow-up questions.
      Avoid generic questions like "What else would you like to know?"
      Focus on business insights and deeper analysis opportunities.

Integration with Chat Workflows: - Follow-up questions consider conversation history for chat mode - Questions adapt based on whether it's a standalone query or part of ongoing conversation - Conversation context helps avoid repetitive suggestions

Performance Optimizations¶

Combined Visualization Step (New in v0.6.2)¶

Ask RITA offers an optimized workflow step that combines visualization choice and data formatting, saving ~250-400ms latency and ~14% cost per query.

How It Works¶

Instead of using two separate LLM calls: 1. choose_visualization - Choose the visualization type 2. format_data_for_visualization - Format the data

You can use a single optimized step: - choose_and_format_visualization - Does BOTH in one LLM call!

Configuration - Choose ONE Approach¶

workflow:
  steps:
    # OPTION 1 (Recommended): Combined step - SINGLE LLM call
    choose_and_format_visualization: true       # DEFAULT: Uses optimized single call

    # OPTION 2 (Legacy): Separate steps - TWO LLM calls
    choose_visualization: false                 # Set combined to false to use these
    format_data_for_visualization: false        # Set combined to false to use these

⚠️ IMPORTANT: Only enable ONE approach at a time: - ✅ Either choose_and_format_visualization: true (recommended) - ✅ Or choose_visualization: true + format_data_for_visualization: true (legacy) - ❌ Do NOT enable both combined AND separate steps

Default Behavior¶

The new defaults in ConfigManager use the optimized approach:

@dataclass
class WorkflowConfig:
    steps: Dict[str, bool] = field(default_factory=lambda: {
        # ... other steps ...
        "choose_and_format_visualization": True,  # DEFAULT: Optimized
        "choose_visualization": False,            # Legacy
        "format_data_for_visualization": False    # Legacy
    })

Custom Prompt Configuration¶

The combined step uses the choose_and_format_visualization prompt. All example configs include this:

prompts:
  # OPTION 1: Combined prompt (for single LLM call)
  choose_and_format_visualization:
    system: |
      You are an expert data visualization assistant that BOTH recommends 
      visualizations AND formats data for them - all in a SINGLE response.

      Your task has TWO parts that must be completed together:
      1. CHOOSE the most appropriate visualization type
      2. FORMAT the data for both legacy and universal chart formats

      **Available Chart Types:**
      - bar, horizontal_bar, line, pie, scatter, area, table, none

      **You MUST provide BOTH formats in your response:**
      1. legacy_format: Backward-compatible structure
      2. universal_format: Modern UniversalChartData structure

    human: |
      **Question:** {question}
      **SQL Query:** {sql_query}
      **Data:** {num_rows} rows x {num_cols} columns
      **Sample:** {query_results_sample}
      **Full:** {query_results_full}

      Generate complete response with ALL fields.

  # OPTION 2: Separate prompts (for legacy two-step approach)
  choose_visualization:
    system: "Recommend appropriate visualization..."
    human: "Question: {question}..."

  format_data_for_visualization:
    system: "Format data for the chosen visualization..."
    human: "Visualization: {visualization}..."

Backward Compatibility¶

Old configurations automatically work:

Configuration	Behavior
✅ `choose_and_format_visualization: true` (default)	Optimized single LLM call
✅ `choose_visualization: true` + `format_data_for_visualization: true`	Legacy two LLM calls
❌ Both disabled	No visualization
⚠️ Both combined AND separate enabled	ERROR - Choose ONE approach!

Benefits¶

~250-400ms faster per query (1 fewer LLM call)
~14% cheaper (saved LLM call = ~$0.0015 per query for GPT-4o)
Type-safe with Pydantic CombinedVisualizationResponse model
Explicit control via workflow steps configuration
No hidden magic - clear configuration options

Migration Guide¶

If you have existing configs with the old separate steps:

# OLD (still works, but slower)
workflow:
  steps:
    choose_visualization: true
    format_data_for_visualization: true

# NEW (recommended - faster and cheaper)
workflow:
  steps:
    choose_and_format_visualization: true
    choose_visualization: false
    format_data_for_visualization: false

Example Configs¶

All example configs use the optimized approach by default: - example-configs/query-openai.yaml ✅ - example-configs/query-bigquery.yaml ✅ - example-configs/query-bigquery-advanced.yaml ✅ - example-configs/query-vertex-ai.yaml ✅ - example-configs/query-azure-openai.yaml ✅ - example-configs/query-bedrock.yaml ✅ - example-configs/query-vertex-ai-gcloud.yaml ✅ - example-configs/query-snowflake.yaml ✅