
AI MODEL & VENDOR EVALUATION
Independent Due Diligence Before You Buy or Ship AI
When you buy an AI tool, you inherit its risk, and you're often the deployer under the EU AI Act. We run independent due diligence on the model and the vendor so you sign with your eyes open.
We assess security, model and data provenance, your EU AI Act role, GDPR terms, accuracy, human oversight, support, and exit. The free checklist is your triage tool. This is the done-for-you evaluation.
THE REALITY
Buy an AI Tool, Inherit Its Risk
A model is only as safe as the vendor behind it, and their problems become yours the moment you deploy. Standard procurement checklists were built for software licences, so they miss the AI-specific questions that decide whether you're exposed.
What Procurement Misses
- Who's the deployer under the EU AI Act, and what Article 26 duties attach to you
- Whether your inputs quietly train the vendor's model
- If the data processing agreement actually meets GDPR Article 28
- Whether the vendor tests against the OWASP LLM Top 10 at all
- How accuracy is evaluated, and whether the number survives your use case
- What the exit path is, and who holds your data on the way out
What Due Diligence Answers
- Your exact role as provider or deployer, with the duties spelled out
- Verified provenance for the model and the training data
- A data processing agreement checked against GDPR Article 28
- Security tested to the OWASP LLM Top 10 and the vendor's SOC 2 claims
- Accuracy and evaluation mapped to NIST AI RMF and ISO/IEC 42001
- A red, amber, green verdict a board can act on
OUR APPROACH
Scope, Assess, Rate, Recommend
We're a Responsible AI advisory that also delivers, so we don't just hand you a report. We scope against your use case, score the model and vendor against a fixed rubric, mark each area red, amber, or green, and then support the decision through the vendor review.
Scope Against Your Use Case
The right answer depends on what you're actually deploying. A support chatbot and a credit-decision model carry different risk, so we start by pinning down the use case, the data it touches, and the decisions it drives. That's what tells us which controls matter and how hard to press on each one.
Assess Against a Rubric
We score the model and the vendor against a fixed rubric: security, provenance, EU AI Act role, GDPR terms, accuracy and evaluation, human oversight, support and SLAs, and exit. Every score traces to evidence you can check, not a vendor's marketing claim.
Red, Amber, Green
Each area gets a rating and a plain reason behind it. Green means proven and low risk. Amber means fixable with conditions we spell out. Red means walk away or renegotiate before you sign. You get one page a board can read and a detailed annex your technical team can defend.
Recommend and Support the Decision
We give you a clear recommendation and the reasoning, then stay in the room. That means sitting in the vendor review, drafting the questions procurement should ask, and checking the answers when they come back. The call is yours. We make sure it's an informed one.
WHAT WE ASSESS
Six Areas That Decide Whether You're Exposed
Every evaluation covers the same six areas, scored against named frameworks so the verdict traces to evidence, not opinion. We press hardest where your use case carries the most risk.
Security Review
We test the vendor against the OWASP LLM Top 10 and their claimed SOC 2 controls. Prompt injection, data leakage, insecure output handling, access controls, and how they handle your data in transit and at rest. If they've never heard of the OWASP LLM Top 10, that tells you something on its own.
Data and Model Provenance
Where did the training data come from, and can they prove it? We check the model's lineage, whether your inputs get used for training, retention and deletion terms, and whether the vendor can actually answer these questions or just deflects. Provenance you can't verify is a liability you inherit.
EU AI Act Role
Buying an AI tool often makes you the deployer under the EU AI Act, with duties under Article 26 that the vendor doesn't carry for you. We establish who is the provider and who is the deployer, what obligations sit with each, and whether the system falls into a high-risk category that changes what you both have to do.
GDPR and the DPA
If the vendor processes personal data, GDPR Article 28 sets what the data processing agreement has to contain. We read the actual DPA, not the sales deck, and flag missing sub-processor terms, international transfer gaps, and clauses that quietly shift liability onto you.
Accuracy and Evaluation
A model that's confidently wrong is worse than no model. We look at how the vendor evaluates accuracy, what benchmarks they cite, how they handle hallucination and drift, and whether their numbers hold for your use case or only for the demo. We map this against NIST AI RMF and ISO/IEC 42001 where they apply.
Support, SLAs and Exit
What happens when it breaks at 2am, when they change the model under you, or when you want to leave? We check the SLA, the change-notification terms, the support tiers, and the exit path: can you get your data out, and what happens to it on their side. Lock-in is a cost you pay later for a signature you make now.
We run this across sectors where a bad vendor choice carries real cost. See how it lands for B2B SaaS and financial services.
Evaluation sits alongside our AI Governance work, which builds the controls once you've chosen. Check where you stand with the free EU AI Act readiness tool.
WHY US
Independent, and Honest About the Answer
We don't resell any model, so we've no reason to steer you toward one. The right choice depends on your use case, and we'll say so even when a cleaner recommendation would sound better. That independence is the point of hiring us instead of asking the vendor.
NO VENDOR TIES
We Don't Resell
No affiliate deals, no reseller margin, no model we're quietly paid to recommend. The evaluation answers to you, not to a vendor's pipeline.
FRAMEWORK-LED
Named Standards
EU AI Act, GDPR Article 28, NIST AI RMF, ISO/IEC 42001, SOC 2, and the OWASP LLM Top 10. Every finding cites the framework it's measured against.
DEPENDS ON YOU
No Fixed Winner
Well-governed platforms like Claude or AWS Bedrock often score well on controls, but the right pick turns on your requirements. We score the real options against them.
QUESTIONS
What Buyers Ask Before They Sign
How do you evaluate an AI vendor?
We scope against your use case first, because the right answer depends on what you're deploying and the decisions it drives. Then we score the model and the vendor against a fixed rubric: security tested to the OWASP LLM Top 10, data and model provenance, your EU AI Act role, the GDPR data processing agreement under Article 28, accuracy and evaluation mapped to NIST AI RMF and ISO/IEC 42001, human oversight, support and SLAs, and exit. Each area gets a red, amber, or green rating with the evidence behind it. You get a one-page summary a board can read and a detailed annex your technical team can defend.
Are we liable for a third-party AI tool?
Often, yes. Buying an AI tool doesn't outsource the risk. Under the EU AI Act, deploying a third-party system usually makes you the deployer, which carries duties under Article 26 that the vendor doesn't discharge for you. If the tool processes personal data, GDPR makes you responsible for the processing even when a vendor runs it, and Article 28 governs what your data processing agreement has to say. We establish exactly where you sit as provider or deployer, what obligations attach, and where the contract quietly pushes liability back onto you.
What evidence should we demand from an AI vendor?
Ask for the things a serious vendor already has. A SOC 2 report, not a claim of one. Evidence they test against the OWASP LLM Top 10. A data processing agreement that meets GDPR Article 28, with sub-processors and transfer mechanisms named. Model and data provenance you can verify, including whether your inputs train their model. Their accuracy evaluation method and results, not just a demo. Alignment with NIST AI RMF or an ISO/IEC 42001 management system. Clear SLAs, change-notification terms, and a documented exit path. If a vendor can't produce these, that absence is itself your finding.
Which AI models are best for compliance?
It depends on your requirements, and anyone who gives you a fixed answer without asking about your use case is selling something. For compliance-sensitive work, well-governed enterprise platforms such as Claude or AWS Bedrock often score well on the controls that matter, including data handling, provenance, and enterprise security posture. But the right choice turns on your specific use case, your data, your regulatory context, and the decisions the model drives. Our job is to score the real options against your requirements and tell you honestly which one fits, not to hand you a league table that ignores your context.
Do you help with the procurement or security review?
Yes. This is done-for-you evaluation, not just a report you're left to act on. We sit in the vendor review, draft the questions procurement and your CISO should be asking, and check the answers against the evidence when they come back. The free vendor-assessment checklist is the starting tool for your own triage. This service is the independent due diligence behind a decision you can defend to your board, your regulator, and your auditors.
EVALUATE A VENDOR
Sign With Your Eyes Open
Start your own triage with the free vendor-assessment checklist, or bring us in for independent due diligence on the model and vendor before you buy or ship.
Get Started