Model Comparison
| Model | Suites | Overall Score | Correct Alerts | Threats Caught | Confidence | False Alarm Rate | Latency | Cost |
|---|---|---|---|---|---|---|---|---|
| grok-4.1 | stranger-meetinghealth-risknsfwgrooming-realgrooming | 79.9% | 86.7% | 80.3% | 0.740 | 14.1% | 1310ms | $4.7531 |
| grok-3 | stranger-meetinghealth-risknsfwgrooming-realgrooming | 59.5% | 66.6% | 59.4% | 0.748 | 14.2% | 1841ms | $3.8808 |
| claude-opus-4.6 | stranger-meetinghealth-risknsfwgrooming-realgrooming | 42.2% | 51.3% | 36.3% | 0.709 | 6.2% | 3043ms | $8.4057 |
| gemini-3-pro | stranger-meetinghealth-risknsfwgrooming-realgrooming | 0.0% | 0.0% | 0.0% | 0.718 | 0.0% | 5255ms | $3.1661 |
| gemini-2.5-pro | stranger-meetinghealth-risknsfwgrooming-realgrooming | 0.0% | 0.0% | 0.0% | 0.718 | 0.0% | 3319ms | $2.1123 |
| gpt-5 | stranger-meetinghealth-risknsfwgrooming-realgrooming | 0.0% | 0.0% | 0.0% | 0.718 | 0.0% | 484ms | $2.5889 |
