Running Analysis
This guide covers configuring and running behavioral analysis with slopit.
Basic Analysis
The simplest analysis uses default configuration:
from slopit import load_sessions
from slopit.pipeline import AnalysisPipeline
from slopit.behavioral import KeystrokeAnalyzer
sessions = load_sessions("data/")
pipeline = AnalysisPipeline([KeystrokeAnalyzer()])
result = pipeline.analyze(sessions)
Choosing Analyzers
Select analyzers based on your detection needs:
| Analyzer | Detects | Best For |
|---|---|---|
| KeystrokeAnalyzer | Transcription patterns | Free-text responses |
| FocusAnalyzer | Tab switching | Any task |
| TimingAnalyzer | Fast responses | Timed tasks |
| PasteAnalyzer | Copy/paste | Text entry |
Recommended Configuration
For comprehensive detection, use all analyzers:
from slopit.behavioral import (
KeystrokeAnalyzer,
FocusAnalyzer,
TimingAnalyzer,
PasteAnalyzer,
)
pipeline = AnalysisPipeline([
KeystrokeAnalyzer(),
FocusAnalyzer(),
TimingAnalyzer(),
PasteAnalyzer(),
])
Minimal Configuration
For basic transcription detection:
pipeline = AnalysisPipeline([KeystrokeAnalyzer()])
Configuring Analyzers
Each analyzer accepts a configuration object:
Keystroke Analyzer
from slopit.behavioral import KeystrokeAnalyzer, KeystrokeAnalyzerConfig
config = KeystrokeAnalyzerConfig(
pause_threshold_ms=2000.0, # What counts as a pause
burst_threshold_ms=500.0, # Max IKI within a burst
min_keystrokes=20, # Minimum for analysis
min_iki_std_threshold=100.0, # Flag if std < this
max_ppr_threshold=0.95, # Flag if PPR > this
)
analyzer = KeystrokeAnalyzer(config)
Parameter guidance:
- min_iki_std_threshold: Lower values catch more subtle transcription. 100ms is a good starting point; reduce to 50ms for stricter detection.
- min_keystrokes: Increase for longer responses to ensure reliable statistics.
- max_ppr_threshold: 0.95 means flag if final text is >95% of total keystrokes (minimal revision).
Focus Analyzer
from slopit.behavioral import FocusAnalyzer, FocusAnalyzerConfig
config = FocusAnalyzerConfig(
max_blur_count=5, # Flag if more blurs than this
max_hidden_duration_ms=30000, # Flag if hidden > 30s
blur_paste_window_ms=5000, # Detect paste within 5s of refocus
)
analyzer = FocusAnalyzer(config)
Parameter guidance:
- max_blur_count: Normal users occasionally switch tabs. 5 is permissive; reduce to 2-3 for stricter detection.
- max_hidden_duration_ms: Long hidden periods may indicate using another tool. 30s is a reasonable threshold.
- blur_paste_window_ms: Time window for detecting "copy from AI, paste into task" pattern.
Timing Analyzer
from slopit.behavioral import TimingAnalyzer, TimingAnalyzerConfig
config = TimingAnalyzerConfig(
min_rt_per_char_ms=20.0, # Minimum realistic typing speed
max_rt_cv_threshold=0.1, # Flag if CV < 0.1
instant_response_threshold_ms=2000, # Instant response threshold
instant_response_min_chars=100, # Only for long responses
)
analyzer = TimingAnalyzer(config)
Parameter guidance:
- min_rt_per_char_ms: 20ms/char = 3000 chars/minute, faster than any human typist.
- instant_response_threshold_ms: 2 seconds is very fast for composing text.
- instant_response_min_chars: Only flag instant responses for substantial text.
Paste Analyzer
from slopit.behavioral import PasteAnalyzer, PasteAnalyzerConfig
config = PasteAnalyzerConfig(
large_paste_threshold=50, # Flag pastes > 50 chars
suspicious_preceding_keystrokes=5, # Flag if few keystrokes before
)
analyzer = PasteAnalyzer(config)
Parameter guidance:
- large_paste_threshold: What counts as a "large" paste. 50 chars is about a sentence.
- suspicious_preceding_keystrokes: Legitimate paste usually follows some typing. 5 is permissive.
Pipeline Configuration
Aggregation Strategies
Choose how flags from multiple analyzers are combined:
from slopit.pipeline import PipelineConfig
# Flag if ANY analyzer flags (most sensitive)
config = PipelineConfig(aggregation="any")
# Flag if MAJORITY of analyzers flag (balanced)
config = PipelineConfig(aggregation="majority")
# Use confidence-weighted voting (recommended)
config = PipelineConfig(aggregation="weighted")
Strategy comparison:
| Strategy | Sensitivity | Specificity | Use Case |
|---|---|---|---|
| any | High | Low | Screening, when false positives are acceptable |
| majority | Medium | Medium | Balanced detection |
| weighted | Adaptive | Adaptive | General purpose, production use |
Filtering Thresholds
Control which flags are included in aggregation:
config = PipelineConfig(
aggregation="weighted",
severity_threshold="low", # Include low, medium, high
confidence_threshold=0.5, # Only flags with >= 50% confidence
)
Severity levels:
- info: Informational, not necessarily problematic
- low: Minor indicator, may be false positive
- medium: Moderate concern
- high: Strong indicator of AI assistance
Example Configurations
High sensitivity (catch more, accept more false positives):
config = PipelineConfig(
aggregation="any",
severity_threshold="info",
confidence_threshold=0.3,
)
Balanced (recommended default):
config = PipelineConfig(
aggregation="weighted",
severity_threshold="low",
confidence_threshold=0.5,
)
High specificity (fewer false positives, may miss some):
config = PipelineConfig(
aggregation="majority",
severity_threshold="medium",
confidence_threshold=0.7,
)
Interpreting Results
Verdict Status
result = pipeline.analyze(sessions)
for session_id, verdict in result.verdicts.items():
if verdict.status == "flagged":
# Strong evidence of AI assistance
print(f"{session_id}: FLAGGED (confidence: {verdict.confidence:.2f})")
elif verdict.status == "suspicious":
# Some evidence, requires review
print(f"{session_id}: SUSPICIOUS")
else:
# No significant evidence
print(f"{session_id}: Clean")
Understanding Flags
Each flag provides:
- type: The detection signal (e.g., "low_iki_variance")
- analyzer: Which analyzer produced it
- severity: How serious the signal is
- message: Human-readable description
- confidence: How confident the analyzer is (0.0 to 1.0)
- evidence: Data supporting the flag
- trial_ids: Which trials triggered the flag
for flag in verdict.flags:
print(f"\n{flag.type} ({flag.severity})")
print(f" Analyzer: {flag.analyzer}")
print(f" Message: {flag.message}")
print(f" Confidence: {flag.confidence:.2f}")
print(f" Evidence: {flag.evidence}")
print(f" Trials: {flag.trial_ids}")
Per-Trial Analysis
For detailed per-trial results:
for analyzer_name, results in result.analyzer_results.items():
print(f"\n{analyzer_name}:")
for analysis in results:
for trial_data in analysis.trials:
trial_id = trial_data["trial_id"]
metrics = trial_data["metrics"]
flags = trial_data["flags"]
print(f" {trial_id}: {len(flags)} flags")
Exporting Results
To JSON
import json
with open("results.json", "w") as f:
json.dump(result.to_dict(), f, indent=2, default=str)
To CSV
from slopit.pipeline import CSVExporter
exporter = CSVExporter()
# Session-level results
exporter.export(result, "results.csv")
# Individual flags
exporter.export_flags(result, "flags.csv")
Text Report
from slopit.pipeline import TextReporter
reporter = TextReporter()
# Full report
print(reporter.generate(result))
# Summary table
reporter.print_summary(result)
Batch Processing
For large datasets:
from pathlib import Path
from slopit import load_session
from slopit.pipeline import AnalysisPipeline, CSVExporter
pipeline = AnalysisPipeline([...])
exporter = CSVExporter()
# Process in batches
batch_size = 100
paths = list(Path("data/").glob("*.json"))
for i in range(0, len(paths), batch_size):
batch_paths = paths[i:i + batch_size]
sessions = [load_session(p) for p in batch_paths]
result = pipeline.analyze(sessions)
# Append to CSV
exporter.export(result, f"results_batch_{i//batch_size}.csv")
print(f"Processed batch {i//batch_size + 1}")
Using the CLI
For quick analysis without writing code:
# Basic analysis
slopit analyze data/
# Summary only
slopit analyze data/ --summary
# Save to JSON
slopit analyze data/ --output results.json
# Export to CSV
slopit analyze data/ --csv results.csv
# Specific analyzers
slopit analyze data/ --analyzers keystroke,focus
# Different aggregation
slopit analyze data/ --aggregation majority
Best Practices
- Start with default configuration and adjust based on results
- Use weighted aggregation for production
- Review suspicious sessions manually before exclusion
- Combine multiple analyzers for robust detection
- Consider your population when setting thresholds
- Document your configuration for reproducibility