Skip to content

Research Workflow (CRISP-DM)

Ask RITA includes a full research agent that follows the CRISP-DM methodology to test hypotheses against your database using real statistical analysis — not LLM fabrication.

New in v0.13.0 — Bonferroni correction, Tukey HSD post-hoc tests, and parallel evidence execution.

Table of Contents

Overview

The ResearchAgent orchestrates a six-phase CRISP-DM workflow:

Phase Name What Happens
1 Business Understanding LLM refines the hypothesis, defines success criteria and key variables
2 Data Understanding Schema analysis identifies relevant columns, data quality notes, and limitations
3 Data Preparation SQL Agent generates evidence queries; database executes them in parallel
4 Modeling StatisticalAnalyzer runs real scipy-based tests on collected data
5 Evaluation LLM interprets statistical results → SUPPORTED / REFUTED / INCONCLUSIVE
6 Deployment LLM produces executive summary, insights, recommendations, and next steps

Key design principles:

  • Real statistics — All p-values, effect sizes, and confidence intervals come from scipy, not the LLM
  • LLM for interpretation — The LLM interprets computed results, never fabricates numbers
  • Shared infrastructure — Uses the same ConfigManager, DatabaseManager, and LLMManager as the SQL workflow

Quick Start

1. Install Dependencies

pip install askrita
# scipy is included automatically for statistical analysis

2. Create Configuration

The research workflow uses the same YAML configuration as the SQL workflow. Any valid SQL workflow config works:

database:
  connection_string: "postgresql://${DB_USER}:${DB_PASSWORD}@localhost:5432/mydb"
  max_results: 50000  # Research queries may need more rows

llm:
  provider: "openai"
  model: "gpt-4o"
  temperature: 0.1

3. Test a Hypothesis

from askrita import ConfigManager
from askrita.research import ResearchAgent

config = ConfigManager("config.yaml")
agent = ResearchAgent(config_manager=config)

result = agent.test_hypothesis(
    research_question="Do premium customers spend more than standard customers?",
    hypothesis="Premium customers have a significantly higher average order value"
)

print(f"Conclusion: {result['conclusion']}")
print(f"Confidence: {result['confidence']}%")
print(f"Key findings: {result['insights']}")

Configuration

The research workflow shares the standard Ask RITA configuration. The only research-specific parameter is research_max_results, set at initialization:

agent = ResearchAgent(
    config_manager=config,
    research_max_results=50_000,  # Max rows per evidence query (default: 50,000)
)

This overrides database.max_results internally so evidence queries can return larger datasets for statistical analysis.

database:
  connection_string: "..."
  max_results: 50000       # Higher than typical query use
  query_timeout: 120       # Evidence queries may be complex
  cache_schema: true       # Schema is analyzed at startup

Usage Examples

Hypothesis Testing

from askrita import ConfigManager
from askrita.research import ResearchAgent

config = ConfigManager("config.yaml")
agent = ResearchAgent(config_manager=config)

result = agent.test_hypothesis(
    research_question="Is there a seasonal pattern in sales?",
    hypothesis="Q4 sales are significantly higher than other quarters"
)

# Result structure
print(f"Conclusion: {result['conclusion']}")        # SUPPORTED, REFUTED, or INCONCLUSIVE
print(f"Confidence: {result['confidence']}%")        # 0-100
print(f"Validity: {result['validity_assessment']}")

# Statistical evidence
for finding in result['statistical_findings']:
    print(f"  Test: {finding['test_name']}")
    print(f"  p-value: {finding['p_value']}")
    print(f"  Significant: {finding['is_significant']}")

# Actionable output
print(f"Insights: {result['insights']}")
print(f"Recommendations: {result['recommendations']}")
print(f"Next steps: {result['next_steps']}")

# Bonferroni correction (when multiple tests)
if result.get('bonferroni_p') is not None:
    print(f"Bonferroni-corrected p: {result['bonferroni_p']}")
    print(f"Still significant: {result['bonferroni_significant']}")

Ad-Hoc Queries

Use query() for single questions that generate SQL and return raw data:

result = agent.query("What is the average order value by customer tier?")

print(f"SQL: {result['sql']}")
print(f"Answer: {result['answer']}")
print(f"Data: {result['data']}")  # Raw rows from the database

Schema Analysis

Analyze your database schema to understand research potential:

from askrita.research import SchemaAnalyzer

report = agent.analyze_schema()

print(f"Tables: {report.total_tables}")
print(f"Columns: {report.total_columns}")
print(f"Data model: {report.data_model_type}")
print(f"Research readiness: {report.research_readiness}")

# High-value tables for analysis
for table_name in report.high_value_tables:
    table = next(t for t in report.tables if t.name == table_name)
    print(f"\n{table.name} (entity: {table.entity_type})")
    print(f"  Research value: {table.research_value}")
    print(f"  Suggestions: {table.analysis_suggestions}")

# Generate a detailed text report
detailed = agent.schema_analyzer.generate_detailed_report(report)
print(detailed)

Legacy API

The test_assumption() method is maintained for backwards compatibility but delegates to test_hypothesis():

result = agent.test_assumption(
    assumption="Premium customers have higher lifetime value",
    evidence_queries=[]  # Ignored — queries are generated automatically
)

API Reference

ResearchAgent

class ResearchAgent:
    def __init__(
        self,
        config_manager: Optional[ConfigManager] = None,
        research_max_results: int = 50_000,
        **kwargs,  # Forwarded to SQLAgentWorkflow
    ): ...

    def test_hypothesis(
        self,
        research_question: str,
        hypothesis: str,
    ) -> Dict[str, Any]: ...

    def test_assumption(
        self,
        assumption: str,
        evidence_queries: List[str],  # Ignored
    ) -> Dict[str, Any]: ...

    def query(self, question: str) -> Dict[str, Any]: ...

    def analyze_schema(self) -> SchemaAnalysisReport: ...

    @property
    def schema(self) -> str: ...

test_hypothesis Return Value

Key Type Description
research_question str Original research question
hypothesis str Hypothesis being tested
success_criteria str LLM-defined criteria for support
data_quality str Assessment of data quality
relevant_columns list Columns identified as relevant
data_limitations str Known limitations of the data
statistical_findings list List of StatisticalResult dicts
key_metrics str Summary of key metrics
bonferroni_p float Bonferroni-corrected p-value (if multiple tests)
bonferroni_significant bool Whether result holds after correction
stats_trace list Detailed trace of statistical decisions
conclusion str "SUPPORTED", "REFUTED", or "INCONCLUSIVE"
confidence int 0–100 confidence score
validity_assessment str Assessment of conclusion validity
insights list Key insights from the analysis
recommendations list Actionable recommendations
next_steps list Suggested follow-up research
queries_executed int Number of SQL queries run
errors list Any errors encountered
timestamp str ISO timestamp

SchemaAnalysisReport

@dataclass
class SchemaAnalysisReport:
    database_type: str
    total_tables: int
    total_columns: int
    analysis_timestamp: str
    tables: List[TableAnalysis]
    naming_patterns: Dict
    data_type_distribution: Dict
    high_value_tables: List[str]
    potential_fact_tables: List[str]
    potential_dimension_tables: List[str]
    suggested_relationships: List[Dict]
    schema_complexity: str
    data_model_type: str
    research_readiness: str
    analysis_steps: List[str]
    recommended_analyses: List[str]

TableAnalysis

@dataclass
class TableAnalysis:
    name: str
    full_name: str
    columns: List[ColumnAnalysis]
    description: str
    row_count_estimate: Optional[int]
    primary_keys: List[str]
    foreign_keys: List[str]
    research_value: str
    entity_type: str
    relationships: List[str]
    analysis_suggestions: List[str]

ColumnAnalysis

@dataclass
class ColumnAnalysis:
    name: str
    data_type: str
    is_nullable: bool
    is_primary_key: bool
    is_foreign_key: bool
    description: str
    research_potential: str
    statistical_type: str
    sample_queries: List[str]

How It Works

flowchart TD
    A[Research Question + Hypothesis] --> B[1. Business Understanding]
    B -->|"LLM refines hypothesis, defines success criteria and key variables"| C[2. Data Understanding]
    C -->|"SchemaAnalyzer examines tables, columns, types, relationships"| D[3. Data Preparation]
    D -->|"LLM generates evidence queries; SQL Agent produces SQL; DB executes in parallel"| E[4. Modeling]
    E -->|"StatisticalAnalyzer runs scipy tests: t-test, ANOVA, correlation, chi-square"| F[5. Evaluation]
    F -->|"LLM interprets stats → SUPPORTED / REFUTED / INCONCLUSIVE"| G[6. Deployment]
    G -->|"Executive summary, insights, recommendations, next steps"| H[Final Report]
    style A fill:#2F5496,color:#fff
    style B fill:#0288D1,color:#fff
    style C fill:#4CAF50,color:#fff
    style D fill:#00897B,color:#fff
    style E fill:#4CAF50,color:#fff
    style F fill:#0288D1,color:#fff
    style G fill:#0288D1,color:#fff
    style H fill:#7B1FA2,color:#fff

Integration with SQL Agent

The ResearchAgent constructs an internal SQLAgentWorkflow with several steps disabled (execution, formatting, visualization, follow-ups) so that the SQL path only generates and validates SQL. The research agent then executes queries directly via DatabaseManager.execute_query() in a thread pool for parallel evidence collection.

Statistical Tests

The StatisticalAnalyzer automatically selects the appropriate test based on data characteristics:

Test Selection Logic

Data Shape Normality Test Applied
2 groups, numeric Normal (Shapiro-Wilk) Welch's t-test (unequal variance)
2 groups, numeric Non-normal Mann-Whitney U
3+ groups, numeric All normal One-way ANOVA + Tukey HSD post-hoc
3+ groups, numeric Any non-normal Kruskal-Wallis
2 numeric columns Normal Pearson correlation
2 numeric columns Non-normal Spearman correlation
2+ categorical Chi-square test of independence

Effect Sizes

Each test reports an effect size with interpretation:

Test Effect Size Metric Interpretation Scale
t-test / Mann-Whitney Cohen's d Small (0.2), Medium (0.5), Large (0.8)
ANOVA Eta-squared Small (0.01), Medium (0.06), Large (0.14)
Kruskal-Wallis Epsilon-squared Small (0.01), Medium (0.06), Large (0.14)
Chi-square Cramer's V Small (0.1), Medium (0.3), Large (0.5)
Correlation r / rho Small (0.1), Medium (0.3), Large (0.5)

Multiple Testing Correction

When multiple statistical tests are run in a single hypothesis evaluation, Bonferroni correction is automatically applied. This adjusts p-values to control the family-wise error rate:

  • bonferroni_p = original_p * n_tests
  • bonferroni_significant = bonferroni_p < 0.05

Large Dataset Handling

  • Datasets over 100,000 rows are stratified-sampled to 50,000 rows
  • Normality testing (Shapiro-Wilk) uses up to 5,000 values per group
  • Large-N notes are appended to results when sample sizes exceed 10,000

StatisticalResult

@dataclass
class StatisticalResult:
    test_name: str
    test_statistic: float
    p_value: float
    is_significant: bool          # p < 0.05
    effect_size: Optional[float]
    effect_size_interpretation: Optional[str]
    confidence_interval: Optional[tuple]
    sample_sizes: Optional[Dict]
    group_means: Optional[Dict]
    group_stds: Optional[Dict]
    additional_info: Optional[Dict]

    def to_prompt_text(self) -> str:
        """Format for LLM interpretation."""

DescriptiveStats

@dataclass
class DescriptiveStats:
    variable: str
    count: int
    mean: float
    std: float
    min: float
    max: float
    median: float
    q1: float
    q3: float
    missing: int

    def to_prompt_text(self) -> str:
        """Format for LLM interpretation."""

Schema Analysis

The SchemaAnalyzer provides a structured view of your database for research planning:

report = agent.analyze_schema()

# Classification
print(report.data_model_type)       # e.g., "Star Schema", "Normalized"
print(report.schema_complexity)      # e.g., "Moderate"
print(report.research_readiness)     # e.g., "High"

# Table roles
print(report.potential_fact_tables)       # Transaction/event tables
print(report.potential_dimension_tables)  # Lookup/reference tables
print(report.suggested_relationships)     # FK-based joins

# Per-table detail
for table in report.tables:
    print(f"{table.name}: {table.entity_type} (value: {table.research_value})")
    for col in table.columns:
        print(f"  {col.name}: {col.statistical_type}{col.research_potential}")

When include_sample_data=True (the default) and the schema has 20 or fewer tables, the analyzer runs sample queries on up to 5 high-value tables to estimate row counts.

Troubleshooting

No Statistical Tests Run

Symptom: statistical_findings is empty.

  • Ensure evidence queries return enough rows (at least 2 per group for comparisons)
  • Check that the data contains numeric or categorical columns appropriate for testing
  • Aggregated data (1 row per group) cannot be tested — the trace will show "Aggregated Data — No Statistical Test"

scipy Not Available

Symptom: Results show test_name: "Descriptive Comparison (scipy not available)" with p_value: 1.0.

  • Install scipy: pip install scipy
  • scipy is included in the default askrita installation

Evidence Queries Failing

Symptom: errors list is non-empty, queries_executed is lower than expected.

  • Check database permissions for the tables involved
  • Increase database.query_timeout for complex queries
  • Increase research_max_results if queries hit the row limit

Inconclusive Results

The conclusion is INCONCLUSIVE when:

  • Sample sizes are too small for meaningful statistical testing
  • Effect sizes are negligible despite statistical significance
  • Data quality issues prevent reliable analysis
  • The hypothesis cannot be tested with the available data

See also: