Research Workflow (CRISP-DM)¶
Ask RITA includes a full research agent that follows the CRISP-DM methodology to test hypotheses against your database using real statistical analysis — not LLM fabrication.
New in v0.13.0 — Bonferroni correction, Tukey HSD post-hoc tests, and parallel evidence execution.
Table of Contents¶
- Overview
- Quick Start
- Configuration
- Usage Examples
- API Reference
- How It Works
- Statistical Tests
- Schema Analysis
- Troubleshooting
Overview¶
The ResearchAgent orchestrates a six-phase CRISP-DM workflow:
| Phase | Name | What Happens |
|---|---|---|
| 1 | Business Understanding | LLM refines the hypothesis, defines success criteria and key variables |
| 2 | Data Understanding | Schema analysis identifies relevant columns, data quality notes, and limitations |
| 3 | Data Preparation | SQL Agent generates evidence queries; database executes them in parallel |
| 4 | Modeling | StatisticalAnalyzer runs real scipy-based tests on collected data |
| 5 | Evaluation | LLM interprets statistical results → SUPPORTED / REFUTED / INCONCLUSIVE |
| 6 | Deployment | LLM produces executive summary, insights, recommendations, and next steps |
Key design principles:
- Real statistics — All p-values, effect sizes, and confidence intervals come from scipy, not the LLM
- LLM for interpretation — The LLM interprets computed results, never fabricates numbers
- Shared infrastructure — Uses the same
ConfigManager,DatabaseManager, andLLMManageras the SQL workflow
Quick Start¶
1. Install Dependencies¶
2. Create Configuration¶
The research workflow uses the same YAML configuration as the SQL workflow. Any valid SQL workflow config works:
database:
connection_string: "postgresql://${DB_USER}:${DB_PASSWORD}@localhost:5432/mydb"
max_results: 50000 # Research queries may need more rows
llm:
provider: "openai"
model: "gpt-4o"
temperature: 0.1
3. Test a Hypothesis¶
from askrita import ConfigManager
from askrita.research import ResearchAgent
config = ConfigManager("config.yaml")
agent = ResearchAgent(config_manager=config)
result = agent.test_hypothesis(
research_question="Do premium customers spend more than standard customers?",
hypothesis="Premium customers have a significantly higher average order value"
)
print(f"Conclusion: {result['conclusion']}")
print(f"Confidence: {result['confidence']}%")
print(f"Key findings: {result['insights']}")
Configuration¶
The research workflow shares the standard Ask RITA configuration. The only research-specific parameter is research_max_results, set at initialization:
agent = ResearchAgent(
config_manager=config,
research_max_results=50_000, # Max rows per evidence query (default: 50,000)
)
This overrides database.max_results internally so evidence queries can return larger datasets for statistical analysis.
Recommended Database Settings¶
database:
connection_string: "..."
max_results: 50000 # Higher than typical query use
query_timeout: 120 # Evidence queries may be complex
cache_schema: true # Schema is analyzed at startup
Usage Examples¶
Hypothesis Testing¶
from askrita import ConfigManager
from askrita.research import ResearchAgent
config = ConfigManager("config.yaml")
agent = ResearchAgent(config_manager=config)
result = agent.test_hypothesis(
research_question="Is there a seasonal pattern in sales?",
hypothesis="Q4 sales are significantly higher than other quarters"
)
# Result structure
print(f"Conclusion: {result['conclusion']}") # SUPPORTED, REFUTED, or INCONCLUSIVE
print(f"Confidence: {result['confidence']}%") # 0-100
print(f"Validity: {result['validity_assessment']}")
# Statistical evidence
for finding in result['statistical_findings']:
print(f" Test: {finding['test_name']}")
print(f" p-value: {finding['p_value']}")
print(f" Significant: {finding['is_significant']}")
# Actionable output
print(f"Insights: {result['insights']}")
print(f"Recommendations: {result['recommendations']}")
print(f"Next steps: {result['next_steps']}")
# Bonferroni correction (when multiple tests)
if result.get('bonferroni_p') is not None:
print(f"Bonferroni-corrected p: {result['bonferroni_p']}")
print(f"Still significant: {result['bonferroni_significant']}")
Ad-Hoc Queries¶
Use query() for single questions that generate SQL and return raw data:
result = agent.query("What is the average order value by customer tier?")
print(f"SQL: {result['sql']}")
print(f"Answer: {result['answer']}")
print(f"Data: {result['data']}") # Raw rows from the database
Schema Analysis¶
Analyze your database schema to understand research potential:
from askrita.research import SchemaAnalyzer
report = agent.analyze_schema()
print(f"Tables: {report.total_tables}")
print(f"Columns: {report.total_columns}")
print(f"Data model: {report.data_model_type}")
print(f"Research readiness: {report.research_readiness}")
# High-value tables for analysis
for table_name in report.high_value_tables:
table = next(t for t in report.tables if t.name == table_name)
print(f"\n{table.name} (entity: {table.entity_type})")
print(f" Research value: {table.research_value}")
print(f" Suggestions: {table.analysis_suggestions}")
# Generate a detailed text report
detailed = agent.schema_analyzer.generate_detailed_report(report)
print(detailed)
Legacy API¶
The test_assumption() method is maintained for backwards compatibility but delegates to test_hypothesis():
result = agent.test_assumption(
assumption="Premium customers have higher lifetime value",
evidence_queries=[] # Ignored — queries are generated automatically
)
API Reference¶
ResearchAgent¶
class ResearchAgent:
def __init__(
self,
config_manager: Optional[ConfigManager] = None,
research_max_results: int = 50_000,
**kwargs, # Forwarded to SQLAgentWorkflow
): ...
def test_hypothesis(
self,
research_question: str,
hypothesis: str,
) -> Dict[str, Any]: ...
def test_assumption(
self,
assumption: str,
evidence_queries: List[str], # Ignored
) -> Dict[str, Any]: ...
def query(self, question: str) -> Dict[str, Any]: ...
def analyze_schema(self) -> SchemaAnalysisReport: ...
@property
def schema(self) -> str: ...
test_hypothesis Return Value¶
| Key | Type | Description |
|---|---|---|
research_question |
str |
Original research question |
hypothesis |
str |
Hypothesis being tested |
success_criteria |
str |
LLM-defined criteria for support |
data_quality |
str |
Assessment of data quality |
relevant_columns |
list |
Columns identified as relevant |
data_limitations |
str |
Known limitations of the data |
statistical_findings |
list |
List of StatisticalResult dicts |
key_metrics |
str |
Summary of key metrics |
bonferroni_p |
float |
Bonferroni-corrected p-value (if multiple tests) |
bonferroni_significant |
bool |
Whether result holds after correction |
stats_trace |
list |
Detailed trace of statistical decisions |
conclusion |
str |
"SUPPORTED", "REFUTED", or "INCONCLUSIVE" |
confidence |
int |
0–100 confidence score |
validity_assessment |
str |
Assessment of conclusion validity |
insights |
list |
Key insights from the analysis |
recommendations |
list |
Actionable recommendations |
next_steps |
list |
Suggested follow-up research |
queries_executed |
int |
Number of SQL queries run |
errors |
list |
Any errors encountered |
timestamp |
str |
ISO timestamp |
SchemaAnalysisReport¶
@dataclass
class SchemaAnalysisReport:
database_type: str
total_tables: int
total_columns: int
analysis_timestamp: str
tables: List[TableAnalysis]
naming_patterns: Dict
data_type_distribution: Dict
high_value_tables: List[str]
potential_fact_tables: List[str]
potential_dimension_tables: List[str]
suggested_relationships: List[Dict]
schema_complexity: str
data_model_type: str
research_readiness: str
analysis_steps: List[str]
recommended_analyses: List[str]
TableAnalysis¶
@dataclass
class TableAnalysis:
name: str
full_name: str
columns: List[ColumnAnalysis]
description: str
row_count_estimate: Optional[int]
primary_keys: List[str]
foreign_keys: List[str]
research_value: str
entity_type: str
relationships: List[str]
analysis_suggestions: List[str]
ColumnAnalysis¶
@dataclass
class ColumnAnalysis:
name: str
data_type: str
is_nullable: bool
is_primary_key: bool
is_foreign_key: bool
description: str
research_potential: str
statistical_type: str
sample_queries: List[str]
How It Works¶
flowchart TD
A[Research Question + Hypothesis] --> B[1. Business Understanding]
B -->|"LLM refines hypothesis, defines success criteria and key variables"| C[2. Data Understanding]
C -->|"SchemaAnalyzer examines tables, columns, types, relationships"| D[3. Data Preparation]
D -->|"LLM generates evidence queries; SQL Agent produces SQL; DB executes in parallel"| E[4. Modeling]
E -->|"StatisticalAnalyzer runs scipy tests: t-test, ANOVA, correlation, chi-square"| F[5. Evaluation]
F -->|"LLM interprets stats → SUPPORTED / REFUTED / INCONCLUSIVE"| G[6. Deployment]
G -->|"Executive summary, insights, recommendations, next steps"| H[Final Report]
style A fill:#2F5496,color:#fff
style B fill:#0288D1,color:#fff
style C fill:#4CAF50,color:#fff
style D fill:#00897B,color:#fff
style E fill:#4CAF50,color:#fff
style F fill:#0288D1,color:#fff
style G fill:#0288D1,color:#fff
style H fill:#7B1FA2,color:#fff
Integration with SQL Agent¶
The ResearchAgent constructs an internal SQLAgentWorkflow with several steps disabled (execution, formatting, visualization, follow-ups) so that the SQL path only generates and validates SQL. The research agent then executes queries directly via DatabaseManager.execute_query() in a thread pool for parallel evidence collection.
Statistical Tests¶
The StatisticalAnalyzer automatically selects the appropriate test based on data characteristics:
Test Selection Logic¶
| Data Shape | Normality | Test Applied |
|---|---|---|
| 2 groups, numeric | Normal (Shapiro-Wilk) | Welch's t-test (unequal variance) |
| 2 groups, numeric | Non-normal | Mann-Whitney U |
| 3+ groups, numeric | All normal | One-way ANOVA + Tukey HSD post-hoc |
| 3+ groups, numeric | Any non-normal | Kruskal-Wallis |
| 2 numeric columns | Normal | Pearson correlation |
| 2 numeric columns | Non-normal | Spearman correlation |
| 2+ categorical | — | Chi-square test of independence |
Effect Sizes¶
Each test reports an effect size with interpretation:
| Test | Effect Size Metric | Interpretation Scale |
|---|---|---|
| t-test / Mann-Whitney | Cohen's d | Small (0.2), Medium (0.5), Large (0.8) |
| ANOVA | Eta-squared | Small (0.01), Medium (0.06), Large (0.14) |
| Kruskal-Wallis | Epsilon-squared | Small (0.01), Medium (0.06), Large (0.14) |
| Chi-square | Cramer's V | Small (0.1), Medium (0.3), Large (0.5) |
| Correlation | r / rho | Small (0.1), Medium (0.3), Large (0.5) |
Multiple Testing Correction¶
When multiple statistical tests are run in a single hypothesis evaluation, Bonferroni correction is automatically applied. This adjusts p-values to control the family-wise error rate:
bonferroni_p = original_p * n_testsbonferroni_significant = bonferroni_p < 0.05
Large Dataset Handling¶
- Datasets over 100,000 rows are stratified-sampled to 50,000 rows
- Normality testing (Shapiro-Wilk) uses up to 5,000 values per group
- Large-N notes are appended to results when sample sizes exceed 10,000
StatisticalResult¶
@dataclass
class StatisticalResult:
test_name: str
test_statistic: float
p_value: float
is_significant: bool # p < 0.05
effect_size: Optional[float]
effect_size_interpretation: Optional[str]
confidence_interval: Optional[tuple]
sample_sizes: Optional[Dict]
group_means: Optional[Dict]
group_stds: Optional[Dict]
additional_info: Optional[Dict]
def to_prompt_text(self) -> str:
"""Format for LLM interpretation."""
DescriptiveStats¶
@dataclass
class DescriptiveStats:
variable: str
count: int
mean: float
std: float
min: float
max: float
median: float
q1: float
q3: float
missing: int
def to_prompt_text(self) -> str:
"""Format for LLM interpretation."""
Schema Analysis¶
The SchemaAnalyzer provides a structured view of your database for research planning:
report = agent.analyze_schema()
# Classification
print(report.data_model_type) # e.g., "Star Schema", "Normalized"
print(report.schema_complexity) # e.g., "Moderate"
print(report.research_readiness) # e.g., "High"
# Table roles
print(report.potential_fact_tables) # Transaction/event tables
print(report.potential_dimension_tables) # Lookup/reference tables
print(report.suggested_relationships) # FK-based joins
# Per-table detail
for table in report.tables:
print(f"{table.name}: {table.entity_type} (value: {table.research_value})")
for col in table.columns:
print(f" {col.name}: {col.statistical_type} — {col.research_potential}")
When include_sample_data=True (the default) and the schema has 20 or fewer tables, the analyzer runs sample queries on up to 5 high-value tables to estimate row counts.
Troubleshooting¶
No Statistical Tests Run¶
Symptom: statistical_findings is empty.
- Ensure evidence queries return enough rows (at least 2 per group for comparisons)
- Check that the data contains numeric or categorical columns appropriate for testing
- Aggregated data (1 row per group) cannot be tested — the trace will show "Aggregated Data — No Statistical Test"
scipy Not Available¶
Symptom: Results show test_name: "Descriptive Comparison (scipy not available)" with p_value: 1.0.
- Install scipy:
pip install scipy - scipy is included in the default askrita installation
Evidence Queries Failing¶
Symptom: errors list is non-empty, queries_executed is lower than expected.
- Check database permissions for the tables involved
- Increase
database.query_timeoutfor complex queries - Increase
research_max_resultsif queries hit the row limit
Inconclusive Results¶
The conclusion is INCONCLUSIVE when:
- Sample sizes are too small for meaningful statistical testing
- Effect sizes are negligible despite statistical significance
- Data quality issues prevent reliable analysis
- The hypothesis cannot be tested with the available data
See also:
- Configuration Guide — Complete YAML configuration reference
- Usage Examples — SQL and NoSQL workflow examples
- Supported Platforms — Database and LLM provider support