A/B Testing & Predictive Intelligence
Systematic experiments drive every content decision. No guesswork — only statistically validated strategies.
25
Experiments Run
12
Significant Results
5
Active Experiments
n=71
Avg Sample Size
Completed Experiments
SIGNIFICANT| Experiment | Control (A) | Variant (B) | Lift ↕ | p-value | Winner | 95% CI | n |
|---|---|---|---|---|---|---|---|
|
Question vs Statement Hooks
Question-style hooks dramatically outperform statements. Adopted as default f...
|
Statement opening 312 |
Question opening 847 |
+171.5% |
0.003
highly sig.
|
B | 112.3% — 230.7% | 89 |
|
Thread Length: 5 vs 7 Tweets
7-tweet threads marginally outperform 5-tweet threads. Effect is small but co...
|
5-tweet thread 534 |
7-tweet thread 589 |
+10.3% |
0.031
significant
|
B | 1.2% — 19.4% | 112 |
|
Data Visualization in Thread
Charts increase engagement substantially. Now included by default for all dat...
|
Text-only data 412 |
Inline chart + text 723 |
+75.5% |
0.008
highly sig.
|
B | 48.2% — 102.8% | 67 |
|
Source Citation Style
Inline citations boost perceived credibility. Adopted for all threads with sc...
|
Sources at end 0.72 |
Inline source links 0.84 |
+16.7% |
0.022
significant
|
B | 5.1% — 28.3% | 54 |
|
Posting Time: Morning vs Afternoon
Afternoon UTC posting catches both EU evening and US morning audiences. Set a...
|
08:00-10:00 UTC 4200 |
14:00-16:00 UTC 6100 |
+45.2% |
0.011
significant
|
B | 22.8% — 67.6% | 78 |
|
Milestone Framing
Milestone framing more than doubles engagement. Now auto-applied when data mi...
|
Standard framing 312 |
Milestone/record framing 691 |
+121.5% |
0.012
significant
|
B | 68.4% — 174.6% | 54 |
|
Carbon Price Context
Historical context significantly improves carbon price thread performance. Al...
|
Price only 289 |
Price + historical comparison 467 |
+61.6% |
0.018
significant
|
B | 28.3% — 94.9% | 62 |
|
Emoji Usage in Hooks
No statistically significant difference. Emoji usage does not meaningfully af...
|
No emojis 478 |
1-2 relevant emojis in hook 512 |
+7.1% |
0.142
not sig.
|
None | -3.4% — 17.6% | 85 |
|
Tagging Authors in Threads
Tagging authors significantly boosts engagement via retweets. Implemented for...
|
No author tags 356 |
Tag paper authors when on X 623 |
+75.0% |
0.006
highly sig.
|
B | 38.2% — 111.8% | 48 |
|
Thread Conclusion Style
CTA conclusions nearly double reply counts. Adopted for threads targeting com...
|
Summary conclusion 8.2 |
Call-to-action conclusion 14.7 |
+79.3% |
0.009
highly sig.
|
B | 34.1% — 124.5% | 72 |
|
Single vs Multi-Topic Threads
Multi-topic synthesis does not significantly outperform single-topic. Focus o...
|
Single focused topic 445 |
Two related topics synthesized 461 |
+3.6% |
0.287
not sig.
|
None | -6.8% — 14.0% | 58 |
|
Uncertainty Communication Style
No significant difference in perceived credibility between numeric and plain ...
|
Numeric confidence intervals 0.81 |
Plain language uncertainty 0.79 |
-2.5% |
0.412
not sig.
|
None | -9.3% — 4.3% | 64 |
|
Platform-Specific Formatting
Platform-specific formatting yields significant gains. LinkedIn prefers longe...
|
Same format cross-platform 387 |
Platform-optimized formatting 542 |
+40.1% |
0.015
significant
|
B | 16.7% — 63.5% | 90 |
|
Correction Acknowledgment Placement
Pinned correction replies boost credibility perception. Adopted as standard c...
|
Correction at thread end 0.76 |
Correction as pinned reply 0.88 |
+15.8% |
0.027
significant
|
B | 3.2% — 28.4% | 42 |
|
Weekend vs Weekday Publication
Weekend posts significantly underperform. Pipeline now avoids Saturday/Sunday...
|
Weekday only 5800 |
Include weekend posts 4100 |
-29.3% |
0.004
highly sig.
|
A | -42.1% — -16.5% | 96 |
Active Experiments
RUNNING
Expert Quote Inclusion
A: No expert quotes vs B: 1-2 expert quotes per thread
Measuring: engagement · Started Feb 4, 2026
23/60 samples
Cross-Domain Linking
A: Single-topic thread vs B: Cross-reference related topics
Measuring: profile_visits · Started Feb 7, 2026
14/50 samples
Follow-Up Timing
A: No follow-up thread vs B: 24h follow-up with new data
Measuring: retention · Started Feb 9, 2026
8/40 samples
Newsletter Preview as Hook
A: Standard thread hook vs B: Thread hook teasing newsletter deep-dive
Measuring: newsletter_signups · Started Feb 10, 2026
6/50 samples
Bluesky vs X Simultaneous Posting
A: X-first, Bluesky 2h delay vs B: Simultaneous cross-post
Measuring: combined_engagement · Started Feb 11, 2026
3/45 samples
Archived (Non-Significant)
p ≥ 0.05Experiments that completed but did not reach statistical significance. These null results are just as informative — they tell us what doesn't matter.
| Experiment | A | B | Lift | p-value | 95% CI | n | Conclusion |
|---|---|---|---|---|---|---|---|
| Hashtag Count: 2 vs 5 | 2 hashtags per thread 4850 |
5 hashtags per thread 4920 |
+1.4% | 0.681 | -8.2% — 11.0% | 74 | Additional hashtags provide no measurable impression gain. Archived; using 2-3 hashtags for readability. |
| Formal vs Conversational Tone | Academic formal tone 398 |
Conversational but precise tone 421 |
+5.8% | 0.198 | -4.1% — 15.7% | 80 | Tone difference does not significantly affect engagement. Maintaining current balanced tone. |
| Thread Numbering Style | 1/7 style numbering 0.62 |
No numbering 0.59 |
-4.8% | 0.334 | -14.2% — 4.6% | 66 | Thread numbering does not significantly affect read-through rate. Keeping numbering for clarity. |
| Link Placement: Mid-Thread vs End | Source links in tweet 3-4 34.2 |
All source links in final tweet 31.8 |
-7.0% | 0.245 | -18.9% — 4.9% | 56 | Link placement position has no significant effect on click-through. Using inline links per citation style experiment. |
| Alt Text Detail Level for Charts | Brief alt text (key takeaway only) 502 |
Detailed alt text (full data description) 518 |
+3.2% | 0.478 | -7.6% — 14.0% | 44 | Alt text detail level does not significantly affect general engagement. Using detailed alt text for accessibility. |
Phase 22 Predictive Intelligence ·
12/25 experiments reached statistical significance (p < 0.05) ·
5 archived (non-significant)