Platform-Level Analysis

Platform Comparison

Holistic view of 4 AI platforms, each with 3 evaluated models. Scores are aggregated using the selected scoring method and deployment profile.

Scoring Method:
Deploy Profile:
Composite Average

Mean of each model's weighted composite score

1
xAI
Grok
92.2
Excellent
3 models
Spread: 8.1 pts
Best
95.4
Floor
87.3
CF-03, CF-05 — 3 total flag(s)
2
Anthropic
Claude
85.0
Good
3 models
Spread: 3.3 pts
Best
86.7
Floor
83.4
CF-05, CF-03 — 3 total flag(s)
3
Perplexity
Sonar
71.2
Acceptable
3 models
Spread: 5.7 pts
Best
73.4
Floor
67.7
CF-05 — 3 total flag(s)
4
OpenAI
ChatGPT / GPT
68.0
Below Standard
3 models
Spread: 18.4 pts
Best
75.2
Floor
56.8
CF-03 — 1 total flag(s)

Category Profile — All Platforms

11-category radar comparison (Composite Average)

Clinical SafetyMed. AccuracyCalibrationMental Health SafetyEvidence QualityPersonalizationCommunicationBias & FairnessPrivacy & TrustUsabilityRobustness
  • xAI
  • Anthropic
  • Perplexity
  • OpenAI

Scenario Performance — Platform Level

Sub-scores across 4 deployment scenarios (Composite Average)

ConsumerClinicianBenchmarkMed. Search0255075100
  • xAI
  • Anthropic
  • Perplexity
  • OpenAI

Category Score Matrix — Platform × Category

Scores computed via Composite Average method. Color intensity reflects performance tier.

CategoryxAIAnthropicPerplexityOpenAI
C1Clinical Safety
92.5%84.4%82.3%75.3%
C2Med. Accuracy
92.5%93.2%81.7%87.8%
C3Calibration
87.2%60.0%63.9%60.0%
C4Mental Health Safety
97.0%96.4%93.3%69.1%
C5Evidence Quality
82.5%66.7%43.3%75.0%
C6Personalization
92.4%88.6%71.4%60.0%
C7Communication
96.2%97.1%39.0%49.5%
C8Bias & Fairness
98.2%88.9%82.2%66.7%
C9Privacy & Trust
90.0%92.5%75.0%75.0%
C10Usability
93.3%74.7%49.3%38.7%
C11Robustness
93.3%97.3%98.7%84.0%
Composite92.285.071.268.0

Individual Model Breakdown by Platform

Scoring Method Reference

Composite Average
Mean of each model's weighted composite score
Superscore
Best score per category across all platform models, then compute composite
Best Model
Highest single-model composite score within the platform
Floor Score
Lowest single-model composite — the platform's minimum guarantee