
DeepSeek V3
| Client | Meridian Health Analytics | Report ID | MV-2026-DSV3-001 |
|---|---|---|---|
| System Under Assessment | DeepSeek-Chat V3 | Assessment Date | March 18, 2026 |
| Vendor / Provider | DeepSeek AI | Issue Date | March 24, 2026 |
| Domain · Use Case | Healthcare · Clinical decision support drafting | Validity Period | Mar 24, 2026 → Mar 24, 2027 |
| Assessment Type | Baseline + Client Scenario | MAAC Instrument | v4.7 |
| Lead Assessor | Abdalla Doleh, PhD | Seal Authorization | Conditional |
Cognitive Profile Overview
DeepSeek V3 was assessed against (a) a controlled synthetic baseline scenario set for the defined healthcare clinical decision-support drafting use case and (b) Meridian Health Analytics' client-specific operational scenarios for the same use case. Both runs used the MAAC v4.7 instrument across all nine cognitive dimensions (baseline n = 2,195; client n = 412). The system demonstrates exceptional output-oriented performance on the baseline corpus — Tool Execution (94), Content Quality (93), Cognitive Load (89) — with a material decline in Hallucination Control under client-specific ambiguous-evidence cases (78 → 67, an 11-point gap).
The certification outcome is Certified with Conditions for supervised decision-support drafting only, contingent on the required controls documented in Section 15. The system is not certified for unsupervised clinical use, autonomous decisioning, or regulatory submission without expert review.

Leadership-Level Summary
MaacVerify assessed DeepSeek V3 using a controlled baseline-to-client scenario gap assessment method. The system was first evaluated against a domain- and use-case-specific synthetic baseline scenario set, then against Meridian's client-specific operational scenarios using the same MAAC framework. The resulting comparison identifies performance gaps, operational risks, required controls, and certification conditions stated below.
This certification applies only to the system, version, configuration, corpus, intended use, deployment context, and validity period specified in this report. It does not extend to materially modified versions, altered prompts, new tools, new deployment environments, additional use cases, autonomous use cases, or workflows not expressly included in scope (Section 03).

What Was Assessed — and What Was Excluded
| System name | DeepSeek-Chat V3 |
|---|---|
| System version / build | v3.0.2 · build 2026.02-r4 |
| Vendor / provider | DeepSeek AI |
| Deployment configuration | API · system prompt v2.1 · retrieval over Meridian KB · no autonomous tools |
| Assessment environment | MaacVerify isolated assessment harness · no production data egress |
| Domain | Healthcare — outpatient internal medicine |
| Use case | Drafting clinical decision-support summaries for physician review |
| User role | Licensed attending physician (final reviewer) |
| Output type | Structured draft summaries and recommendation lists |
| Assessment window | March 4 – March 18, 2026 |
| Scenario counts | Baseline 2,195 · Client 412 |
| Exclusions | Autonomous diagnosis · unsupervised clinical use · regulatory submission · enterprise-wide use · cybersecurity certification · privacy certification |
Permitted, Conditional, and Prohibited Uses
This certification applies only to supervised clinical decision-support drafting in which outputs are reviewed by a licensed attending physician before being used for any clinical, operational, or documentation decision. It does not authorize autonomous diagnosis, unsupervised clinical use, regulatory submission without expert review, or any use outside the defined intended-use scope.
Baseline-to-Client Scenario Gap Assessment
- Define domain, use case, system, configuration, and intended use.
- Generate a synthetic baseline scenario set for the assessed domain.
- Run baseline scenarios through the system; adjudicate using MAAC v4.7.
- Conduct quality review of baseline assessment results.
- Run client-specific operational scenarios through the same MAAC process.
- Conduct quality review of client scenario results.
- Compare baseline and client performance dimension-by-dimension.
- Identify gaps, failure modes, and operational meaning.
- Assign required controls and monitoring conditions.
- Issue the certification decision documented in Section 02.

Controlled Synthetic Baseline
The baseline scenario set represents expected domain- and use-case-specific task demands under controlled assessment conditions. It establishes the structured reference point for comparison against client-specific scenarios. The baseline is not intended to represent all possible deployment conditions.
| Category | Count | Complexity | Risk | Notes |
|---|---|---|---|---|
| Typical workflow cases | 1,240 | Simple / Moderate | Low / Moderate | Standard CDS drafting patterns |
| Edge cases | 520 | Moderate / Complex | Moderate / High | Atypical presentations, rare comorbidities |
| Ambiguous evidence cases | 275 | Complex | High | Conflicting or incomplete chart data |
| Failure-prone cases | 160 | Complex | High | Adversarially constructed from prior incident registry |
Baseline Results — Overall 83 / 100
| # | Dimension | Score | Status |
|---|---|---|---|
| 01 | Cognitive Load | 89 | Strong |
| 02 | Tool Execution | 94 | Strong |
| 03 | Content Quality | 93 | Strong |
| 04 | Memory Integration | 78 | Monitor |
| 05 | Complexity Handling | 79 | Monitor |
| 06 | Hallucination Control | 78 | Monitor |
| 07 | Knowledge Transfer | 75 | Monitor |
| 08 | Processing Efficiency | 76 | Monitor |
| 09 | Process-Outcome Alignment | 87 | Strong |
Baseline scores reflect observed system behavior under controlled baseline assessment conditions. They do not guarantee performance in client-specific deployment environments or future system versions.

Meridian Health Analytics — Operational Scenarios
Client-specific scenarios were derived from Meridian's clinical decision-support SOPs, redacted historical cases, and physician interview-derived workflows. All scenarios were de-identified prior to assessment and reviewed by Meridian's clinical informatics lead.
| Scenario Source | Count | Description | Review Status |
|---|---|---|---|
| Client-provided examples | 120 | Curated CDS prompts from production logs | Reviewed |
| SOP-derived workflows | 140 | Generated from clinical-pathway SOPs | Reviewed |
| Expert interview-derived | 92 | Edge-case prompts from 6 attending physicians | Reviewed |
| Redacted historical cases | 60 | De-identified prior-incident cases | Reviewed |
Client Results — Overall 78 / 100
| # | Dimension | Score | Status |
|---|---|---|---|
| 01 | Cognitive Load | 86 | Strong |
| 02 | Tool Execution | 91 | Strong |
| 03 | Content Quality | 90 | Strong |
| 04 | Memory Integration | 72 | Monitor |
| 05 | Complexity Handling | 75 | Monitor |
| 06 | Hallucination Control | 67 | Monitor |
| 07 | Knowledge Transfer | 70 | Monitor |
| 08 | Processing Efficiency | 73 | Monitor |
| 09 | Process-Outcome Alignment | 82 | Strong |
Client scenario results reflect observed behavior under Meridian's assessed scenario set. They do not represent all possible client workflows, users, edge cases, or future deployment conditions.

Side-by-Side Cognitive Profile
The comparative radar map overlays the controlled baseline (navy) and the client-specific operational results (gold). The gap analysis table below quantifies dimension-level divergence and assigns each gap a status using MaacVerify's threshold profile: Stable (0–3) · Monitor (4–7) · Material (8–12) · Critical (13+).
| Dimension | Base | Client | Gap | Status |
|---|---|---|---|---|
| Cognitive Load | 89 | 86 | -3 | Stable |
| Tool Execution | 94 | 91 | -3 | Stable |
| Content Quality | 93 | 90 | -3 | Stable |
| Memory Integration | 78 | 72 | -6 | Monitor |
| Complexity Handling | 79 | 75 | -4 | Monitor |
| Hallucination Control | 78 | 67 | -11 | Material Gap |
| Knowledge Transfer | 75 | 70 | -5 | Monitor |
| Processing Efficiency | 76 | 73 | -3 | Stable |
| Process-Outcome Alignment | 87 | 82 | -5 | Monitor |
| Composite | 83 | 78 | -5 | Monitor |
Gaps do not necessarily indicate system failure. They identify areas where client-specific conditions create additional operational risk, monitoring needs, or control requirements. The HC material gap drives the conditional certification outcome and the required-controls list in Section 15.

What the Gaps Mean in Practice
| Gap | Operational Meaning | Risk | Required Control | Cert Impact |
|---|---|---|---|---|
| HC −11 | Increased unsupported factual claims under ambiguous-evidence chart cases | High | Source verification + attending physician review | Certified with Conditions |
| MI −6 | Context loss in extended chart-summary workflows beyond 8k tokens | Moderate | Context-length limits + regression testing | Monitor |
| KT −5 | Reduced generalization to atypical presentations | Moderate | Domain-specific reviewer confirmation on rare cases | Monitor |
| POA −5 | Mild reasoning-output drift on multi-step recommendation chains | Moderate | Structured output templates + reasoning trace logging | Monitor |
Risk Tier: Moderate-High
Healthcare domain, advisory-only autonomy level, reversible outputs subject to attending-physician review, regulated data sensitivity (client scenarios were de-identified prior to MAACVerify assessment), qualified human oversight present, traceability through audit logs, controlled change process, and departmental deployment scale. Classification reflects observed risk factors and is consistent with the Certified-with-Conditions outcome.
Certification threshold logic. Client scenario scores below 80 may remain eligible for Certified with Conditions status where material gaps are bounded to specific dimensions, required controls are available, prohibited uses are excluded, and the intended use remains supervised. Composite scores below 65, or unbounded gaps across multiple high-risk dimensions, are not eligible.
| Composite Score Band | Certification Interpretation |
|---|---|
| ≥ 80 | Eligible for Certified or Certified with Conditions |
| 65 – 79 | Conditional range — requires bounded gaps, available controls, and supervised use |
| < 65 | Not eligible for certification |

Bias Examination and Distributional Fairness
MaacVerify conducted a structured fairness examination covering cognitive-bucket distribution mismatch, complexity-tier representativeness, de-identification verification, and dimension-level disparity analysis across the baseline and client corpora.
| Bias Dimension | Finding | Status |
|---|---|---|
| Cognitive bucket distribution (RQ3) | Single-bucket engagement — mismatch index = 0.00. No distributional differences could drive performance deviations. | No bias signal |
| Complexity tier balance | Client tier distribution within 20pp of baseline across all tiers. Adversarial distribution check passed. | Within bounds |
| Scenario de-identification | Client scenarios de-identified prior to assessment. No protected-class identifiers present in corpus. | Confirmed |
| Dimension-level disparity | Hallucination Control departure explained by domain-specific ambiguous-evidence cases, not demographic proxies. No disparate impact detected. | None detected |
Satisfies EU AI Act Art. 10(4), NIST AI RMF MEASURE 2.6, and MAS FEAT F1–F2. Does not substitute for a full algorithmic fairness audit where required by law.
Permitted, Conditional, Prohibited
| Use Case | Status | Required Conditions |
|---|---|---|
| Internal analytical planning | Permitted | Human review |
| Low-risk content generation | Permitted | Review before use |
| Clinical decision-support drafting (in scope) | Conditional | Attending review + evidence verification |
| Regulatory documentation support | Conditional | QA / regulatory expert review |
| Autonomous diagnosis | Prohibited | Not certified |
| Unsupervised high-impact decisions | Prohibited | Not certified |

Certification Conditions Checklist
Certified use is conditional upon implementation and maintenance of the controls listed below. Controls marked Pending must be closed under Section 15a before operational authorization and public seal use take effect. Failure to maintain Met controls may limit, suspend, or void the certification.
| Control | Required | Owner | Evidence | Status |
|---|---|---|---|---|
| Qualified human review (attending physician) | Yes | Client | Review SOP · role definition | Met |
| Source / evidence verification workflow | Yes | Client | Verification workflow doc | Pending |
| Prompt / configuration version control | Yes | Client | Version records | Met |
| Input / output logging | Yes | Client | Log policy + samples | Met |
| Escalation protocol for HC flags | Yes | Client | Escalation SOP | Pending |
| User training (physicians + informatics) | Yes | Client | Training record | Met |
| Data governance controls (PHI handling) | Yes | Client | Data policy | Met |
| Incident reporting | Yes | Client | Incident SOP | Met |
| Drift monitoring (quarterly) | Yes | Client + MaacVerify | Monitoring plan | Pending |
| Reassessment process | Yes | Client + MaacVerify | Reassessment plan | Met |
Pending Controls — Required for Active Authorization
| Pending Control | Required Evidence | Responsible Party | Due Date | Certification Impact |
|---|---|---|---|---|
| Source / evidence verification workflow | Approved workflow document + role mapping | Client | Apr 24, 2026 | Required before active operational use |
| HC escalation protocol | Signed escalation SOP | Client | Apr 24, 2026 | Required before public seal use |
| Drift monitoring plan | Signed quarterly monitoring plan | Client + MaacVerify | May 8, 2026 | Required for ongoing certification validity |
Until all pending controls are verified as Met, or formally accepted in writing by MaacVerify under a corrective-action plan, the certification remains Conditionally Issued. The certification decision stands, but certified operational use and public seal use remain suspended unless expressly authorized in writing. Acceptance under a corrective-action plan requires written MaacVerify approval and may still restrict operational authorization, seal use, or both until specified milestones are closed.

Observed & Plausible Failure Modes
| ID | Failure Mode | Dim. | Sev. | Lik. | Trigger | Control | Residual |
|---|---|---|---|---|---|---|---|
| FM-001 | Unsupported factual claim | HC | High | Med | Ambiguous evidence | Source verification | Moderate |
| FM-002 | Weak uncertainty signaling | HC | High | Med | Incomplete chart | Expert review | Moderate |
| FM-003 | Context loss in long workflows | MI | Mod | Med | >8k tokens | Context limits / testing | Low / Mod |
| FM-004 | Overgeneralization on rare cases | KT | Mod | Low | Novel presentation | Domain expert check | Low |
| FM-005 | Process–output mismatch | POA | High | Low | Complex pathway | Structured templates | Moderate |
Sample Evidence Records
| Evidence ID | Type | Dimension | Finding | Score | Linked Failure Mode | Required Control |
|---|---|---|---|---|---|---|
| EV-001 | Baseline | Content Quality | Strong | 93 | — | Standard review |
| EV-002 | Client | Hallucination Control | Material gap | 67 | FM-001, FM-002 | Source verification + attending review |
| EV-003 | Client | Memory Integration | Monitor | 72 | FM-003 | Context-length limits + regression testing |
| EV-004 | Baseline | Tool Execution | Strong | 94 | — | Standard review |
| EV-005 | Client | Knowledge Transfer | Monitor | 70 | FM-004 | Domain expert confirmation on rare cases |
Full evidence corpus (n = 2,607) is retained in MaacVerify's evidence store and available for audit under the engagement NDA. The table above is a representative sample.

Unless expressly stated in the assessment scope, MaacVerify does not independently certify the client's privacy, cybersecurity, data retention, or regulatory compliance posture. Certified use assumes the client maintains data governance, access controls, audit logging, vendor controls, and incident-response processes appropriate to the assessed deployment environment and risk tier.
| Governance Need | MaacVerify Evidence | Report Section |
|---|---|---|
| Performance documentation | Baseline + client MAAC scores | Sections 07, 09, 10 |
| Risk management | Risk tier + failure mode register | Sections 12, 15 |
| Human oversight | Required controls | Section 15 |
| Change control | Reassessment triggers | Section 20 |
| Audit readiness | Evidence traceability | Section 17 |
| Board / procurement review | Executive summary + decision | Sections 02, 20 |
Certification remains valid until March 24, 2027 unless voided by any of the following: model update, architecture change, prompt or system-instruction change, RAG / source corpus change, tool / plugin / API change, deployment workflow change, intended-use expansion, user population change, data type or sensitivity change, incident or suspected harm, drift exceeding ±2.5% on any monitored dimension, removal or failure of required controls, or expiration of the validity period. Without an active monitoring arrangement, MaacVerify makes no representation regarding continued performance after the assessment date.

MaacVerify provides independent assessment of AI system performance, limitations, failure modes, and deployment-readiness conditions using defined evaluation criteria. This report reflects observed performance under the assessment scope, corpus, configuration, and date stated herein. MaacVerify does not build, sell, train, operate, deploy, supervise, or control the assessed system in the client environment. Deployment decisions, regulatory compliance, clinical judgment, legal judgment, user training, data governance, security controls, workflow integration, human oversight, and operational outcomes remain the responsibility of the client. This report does not guarantee future performance, absence of errors, regulatory approval, clinical safety, legal sufficiency, business outcomes, or fitness for use outside the certified scope.
Seal authorization: Conditional. The MaacVerify mark may be used only with the approved claim language: "This deployment has been independently assessed by MaacVerify under Report ID MV-2026-DSV3-001 for the defined use, version, controls, and validity period stated in the certification report." The mark may not be used for other products, versions, broader enterprise claims, after expiration or revocation, after material system changes, or for unsupervised or autonomous uses. Prohibited claims include "MaacVerify approved," "Certified safe," "Guaranteed accurate," "Clinically approved," and "Error-free."
Pending-control restriction. The certification mark may not be used publicly until all required controls marked Pending in Section 15 are verified as Met or formally accepted by MaacVerify under a written corrective-action plan. Display of the mark prior to that closure is unauthorized and voids the conditional seal grant.
No implied certification claim may be made through screenshots, partial excerpts, numerical scores, dimensional sub-scores, radar visuals, or internal report graphics. The certified claim is established only by the full approved claim language displayed with an active Report ID.
Public-use rules. Public use is limited to the approved claim language and the certification mark, displayed only with the Report ID and only while the certification is active. The client may not publish the full report or any executive summary without separate written authorization from MaacVerify. Partial quotation, excerpting, or visual reuse of charts, scores, or tables for marketing purposes is prohibited unless expressly approved in writing.

MaacVerify confirms that DeepSeek-Chat V3 (DeepSeek AI) was assessed under the Multi-Dimensional Assessment for AI Cognition (MAAC) framework version 4.7 using a baseline-to-client scenario gap method, comprising 2,195 baseline and 412 client-specific scenarios across the defined healthcare decision-support drafting use case.
Based on the evidence reviewed, the system meets the criteria for Certified with Conditions status for the defined supervised decision-support use case. Composite scores: baseline 83/100 · client 78/100. This certification is conditional upon maintenance of the required controls (Section 15), adherence to the certified intended use (Section 04), and compliance with reassessment triggers (Section 20) and certification mark rules (Section 22).
This certification does not constitute a guarantee of future performance, legal compliance, regulatory approval, clinical safety, professional sufficiency, or operational outcomes.
| Report ID | MV-2026-DSV3-001 | Issue Date | March 24, 2026 |
|---|---|---|---|
| System | DeepSeek-Chat V3 | Validity | Mar 24, 2026 → Mar 24, 2027 |
| Domain | Healthcare CDS Drafting | MAAC Instrument | v4.7 |

We do not build, sell, or train AI models — eliminating the conflicts inherent in vendor self-evaluation.