Showing posts with label AI Prompt for software testing. Show all posts
Showing posts with label AI Prompt for software testing. Show all posts

Thursday, February 19, 2026

AI Prompt - ML / AI-Specific Testing

AI Prompt

"Create test cases for [ML model/pipeline]. Include data quality checks, drift detection, edge inputs, fairness/bias slices, latency under load, and rollback if metrics regress."

Applying Critical Thinking

·         Data is part of the contract: Model behavior depends on input distribution; tests should cover data quality, schema, and known edge distributions.

·         Slices matter: Aggregate metrics can hide regressions for subgroups; define slices (demographic, region, product) and test them.

·         Drift and lifecycle: Concept and data drift over time; tests should include drift detection and criteria for retrain or rollback.

·         Determinism and reproducibility: Where possible, fixed seeds and fixtures; document non-deterministic areas and their impact on assertions.

Generate Test Cases for Each Feature

·         Data quality: Schema validation (types, ranges, nulls); missing or corrupt features; duplicates; label quality (if supervised); train/serve skew (same preprocessing).

·         Drift detection: Input distribution drift (stats, histograms); concept drift (label distribution or performance over time); thresholds and alerts; dashboard or pipeline step.

·         Edge inputs: Empty input, all nulls, out-of-range values, very long text, special characters; model doesn’t crash and returns safe default or error.

·         Fairness/bias slices: Performance (accuracy, F1, etc.) per slice (e.g. demographic, region); disparity metrics; minimum performance bar per slice; bias mitigation checks.

·         Latency under load: P50/P95/P99 latency at target RPS; batch vs online; GPU/CPU utilization; timeout and degradation behavior under overload.

·         Rollback / regression: When metrics regress (e.g. accuracy drop, fairness violation): rollback to previous model, feature flag, or fallback; pipeline and alerts tested.

·         Reproducibility: Same input + version → same output (where applicable); training run reproducible from config and data version.

·         Adversarial / robustness: Known adversarial or worst-case inputs; model doesn’t fail badly or expose unsafe behavior.

Questions on Ambiguities

·         What metrics define “good” (e.g. accuracy, F1, fairness parity) and what regression threshold triggers rollback?

·         Which slices are required for fairness reporting (e.g. age, region, product type) and what data is available?

·         How is drift defined (statistical test, threshold on distribution) and who acts on drift alerts?

·         What is acceptable latency (online vs batch) and what is the fallback when the model is slow or down?

·         Are explainability or audit logs required (e.g. feature contributions, request/response logging)?

·         Who approves model promotions and rollbacks (ML team, product, compliance)?

Areas Where Test Ideas Might Be Missed

·         Label and annotation quality: wrong or inconsistent labels in training/eval; impact on reported metrics and fairness.

·         Preprocessing parity: train vs serve preprocessing (tokenization, normalization, feature store); subtle skew.

·         Cold start and rare categories: new users or rare items; model behavior and fallback.

·         Feedback loops: model predictions influence future data (e.g. recommendations); long-term bias or collapse.

·         Versioning and A/B: multiple model versions in production; routing and metric attribution per version.

·         Security of model artifact: tampering, extraction, or inversion; not always in scope but worth noting.

·         Cost and resource: inference cost per request; GPU memory; batch size vs latency tradeoff under load.

Output Template

Context: [system/feature under test, dependencies, environment]

Assumptions: [e.g., auth method, data availability, feature flags]

Test Types: [ML, data quality, fairness, performance]

Test Cases:

ID: [TC-001]

Type: [ML/AI]

Title: [short name]

Preconditions/Setup: [data, env, mocks, flags]

Steps: [ordered steps or request details]

Variations: [inputs/edges/negative cases]

Expected Results: [responses/UI states/metrics]

Cleanup: [teardown/reset]

Coverage notes: [gaps, out-of-scope items, risk areas]

Non-functionals: [perf targets, security considerations, accessibility notes]

Data/fixtures: [test users, payloads, seeds]

Environments: [dev/stage/prod-parity requirements]

Ambiguity Questions:

- [Question 1 about unclear behavior]

- [Question 2 about edge case]

Potential Missed Ideas:

- [Suspicious area where tests might still be thin]

AI Prompt - Data Integrity / Migration Testing

AI Prompt

"Generate test cases for migrating data from [source] to [target]. Cover mapping rules, null/invalid handling, rounding, duplicates, referential integrity, rollback, and reconciliation checks."

Applying Critical Thinking

·         Define the golden source: Which system is authoritative for each entity? What is “correct” when source and target disagree?

·         Map every field: Document type, range, nullability, and transformation; explicit handling for unknown or invalid values.

·         Relationships first: Parent-child and foreign keys; order of migration and dependency graph; what happens to orphans.

·         Rollback is part of the design: Test rollback as a first-class scenario; ensure idempotency or clear “migrated” markers so re-runs are safe.

Generate Test Cases for Each Feature

·         Mapping rules: Each field: source → target mapping; default values; derived fields; conditional logic (e.g. if status=X then target flag Y).

·         Null/invalid: Source null → target null or default; invalid enum/date/number → reject, default, or error row; empty string vs null.

·         Rounding/type: Decimals: precision and rounding mode; dates: timezone and truncation; strings: length limits and encoding; integers: overflow.

·         Duplicates: Same business key in source multiple times: first wins, last wins, merge, or reject; how duplicates are logged.

·         Referential integrity: Parent migrated before child; FKs valid; orphans: fail, skip, or default parent; circular refs handled.

·         Rollback: After partial/full run: rollback script leaves target in consistent state; re-run after rollback behaves as defined.

·         Reconciliation: Record counts (per table/type); checksums or hashes for critical fields; spot checks: sample IDs in source vs target; delta report.

·         Idempotency: Running migration twice: no duplicate rows, no double application of side effects; safe resume from checkpoint.

Questions on Ambiguities

·         What is the authoritative definition of each entity (source vs target) after go-live?

·         How should invalid or legacy-bad data be handled: reject row, write to quarantine table, or apply default and log?

·         What precision and rounding apply to money and percentages (e.g. half-up, bank rounding)?

·         Duplicate key strategy: which duplicate wins, and are others logged for manual review?

·         What is the order of migration (tables/entities) and the rollback order; are there cross-system dependencies (e.g. message queue)?

·         Who signs off on reconciliation (counts vs full checksum vs sampling) and what is the go/no-go criterion?

Areas Where Test Ideas Might Be Missed

·         Concurrent writes: source or target updated during migration; locking or snapshot strategy.

·         Large objects / blobs: size limits, streaming, and checksum for binaries; timeouts.

·         Soft deletes and history: migrate only active rows vs full history; deleted parent and child handling.

·         Encoded/encrypted fields: decrypt in source, transform, re-encrypt in target; key rotation during migration.

·         Audit and metadata: created_at, updated_at, migrated_at; preserving vs overwriting.

·         Feature flags or tenant scope: migrate only certain tenants or segments; rest stay on source until phased.

·         Downstream consumers: after cutover, do downstream systems see consistent data (e.g. cache invalidation, event replay).

Output Template

Context: [system/feature under test, dependencies, environment]

Assumptions: [e.g., auth method, data availability, feature flags]

Test Types: [data integrity, migration]

Test Cases:

ID: [TC-001]

Type: [data integrity/migration]

Title: [short name]

Preconditions/Setup: [data, env, mocks, flags]

Steps: [ordered steps or request details]

Variations: [inputs/edges/negative cases]

Expected Results: [responses/UI states/metrics]

Cleanup: [teardown/reset]

Coverage notes: [gaps, out-of-scope items, risk areas]

Non-functionals: [perf targets, security considerations, accessibility notes]

Data/fixtures: [test users, payloads, seeds]

Environments: [dev/stage/prod-parity requirements]

Ambiguity Questions:

- [Question 1 about unclear behavior]

- [Question 2 about edge case]

Potential Missed Ideas:

- [Suspicious area where tests might still be thin]

AI Prompt - Usability testing

AI Prompt

"Provide accessibility/usability test cases for [page/flow]. Include keyboard-only navigation, screen reader announcements, focus management, color contrast, error messaging clarity, and timeouts."

Applying Critical Thinking

·         User-diverse: Test as keyboard-only user and as screen reader user; avoid “we use mouse so it’s fine”.

·         Focus and order: Focus order should match visual order and task flow; focus must not be lost in modals, dropdowns, or dynamic content.

·         Meaning not just presence: ARIA and semantics must convey meaning (e.g. “button”, “alert”, “current step”), not just labels.

·         Errors and timeouts: Messages must be clear, associated with fields, and not rely on color alone; timeouts should warn and allow extension where possible.

Generate Test Cases for Each Feature

·         Keyboard-only: Tab through all interactive elements; no trap; Enter/Space activate buttons/links; Escape closes modals; arrow keys in menus/listboxes; skip link works.

·         Screen reader: Landmarks and headings; button/link names and roles; form labels and errors; live regions for dynamic updates; table headers and scope; no redundant announcements.

·         Focus management: Focus moves to modal when opened and returns on close; focus visible (outline/ring); focus not lost after AJAX or route change; first focusable in view on load.

·         Color contrast: Text and UI components meet contrast ratio (e.g. 4.5:1 normal, 3:1 large); focus indicators visible; don’t rely on color alone for required/error/state.

·         Error messaging: Errors are announced (live region or aria-describedby); message text clear and actionable; associated with field; success/error distinguishable without color only.

·         Timeouts: Session timeout: warning before expiry, option to extend; long operations: progress or status announced; no silent failure.

·         Usability: Labels and instructions clear; destructive actions confirmed; consistent patterns (e.g. submit always same place).

Questions on Ambiguities

·         What level are we targeting (WCAG 2.1 A, AA, AAA) and for which pages/flows?

·         Which screen readers and browsers are in scope (e.g. NVDA + Firefox, VoiceOver + Safari, JAWS)?

·         How should session timeout behave: warning at N minutes, extend button, and what happens to in-progress form data?

·         Are error messages written in plain language and reviewed by support/copy?

·         Do we support reduced motion and prefers-color-scheme (dark/light), and are they part of this test set?

·         Who is responsible for remediation (dev vs design) when contrast or focus order fails?

Areas Where Test Ideas Might Be Missed

·         Dynamic content: injected lists, infinite scroll, SPA route changes: focus and announcements after load.

·         Third-party widgets: chat, video, payment iframes: keyboard access and screen reader support inside iframe.

·         CAPTCHA / auth challenges: alternative (e.g. audio CAPTCHA) or exemption path for assistive tech users.

·         Complex widgets: custom combo boxes, date pickers, tree views: full keyboard and ARIA pattern (e.g. roving tabindex).

·         Mobile screen readers: VoiceOver (iOS), TalkBack (Android): gestures and focus different from desktop.

·         RTL and localization: focus order in RTL; translated labels and errors; font size scaling.

·         Timeout during data entry: user types in form; session expires mid-field; ensure data loss is communicated and recovery path exists.

Output Template

Context: [system/feature under test, dependencies, environment]

Assumptions: [e.g., auth method, data availability, feature flags]

Test Types: [usability, accessibility, UI]

Test Cases:

ID: [TC-001]

Type: [usability/accessibility]

Title: [short name]

Preconditions/Setup: [data, env, mocks, flags]

Steps: [ordered steps or request details]

Variations: [inputs/edges/negative cases]

Expected Results: [responses/UI states/metrics]

Cleanup: [teardown/reset]

Coverage notes: [gaps, out-of-scope items, risk areas]

Non-functionals: [perf targets, security considerations, accessibility notes]

Data/fixtures: [test users, payloads, seeds]

Environments: [dev/stage/prod-parity requirements]

Ambiguity Questions:

- [Question 1 about unclear behavior]

- [Question 2 about edge case]

Potential Missed Ideas:

- [Suspicious area where tests might still be thin]

Friday, February 13, 2026

AI Prompt - Security testing

AI Prompt

"List security test cases for [app/endpoint]. Cover authZ/authN, input validation, common vulns (XSS, SQLi, SSRF, CSRF, IDOR), transport security, logging, and misconfig checks. Include both positive and negative cases."

Applying Critical Thinking

·         Define the trust boundary: What is inside vs outside the system? Every entry point (HTTP, WebSocket, file upload, webhook) is an attack surface.

·         Assume breach for dependencies: Treat third-party libs, APIs, and configs as potentially compromised; test that failures don’t escalate.

·         Positive vs negative: Positive cases prove legitimate users get access and valid inputs succeed; negative cases prove attackers or invalid inputs are rejected or sanitized.

·         Chain thinking: One vuln (e.g. IDOR) can lead to another (e.g. data exfil); consider multi-step attack scenarios.

Generate Test Cases for Each Feature

·         AuthN: Valid login, token refresh, MFA success, session validity. Negative: invalid creds, expired/missing/revoked token, token reuse after logout.

·         AuthZ: Role X can access resource A; tenant isolation. Negative: role Y cannot access A; cross-tenant access; privilege escalation.

·         Input validation: Valid payloads accepted. Negative: oversized input, special chars, null/type confusion, boundary values.

·         XSS: Sanitized output in HTML/JS context. Negative: reflected/stored DOM XSS via user-controlled input.

·         SQLi: Parameterized queries with normal input. Negative: union-based, error-based, blind SQLi payloads.

·         SSRF: Allowed URLs only. Negative: internal IPs, cloud metadata, file://, redirects to internal.

·         CSRF: Request with valid origin/cookie + CSRF token. Negative: request without token, wrong origin, same-site cookie behavior.

·         IDOR: User A can only access A’s resources. Negative: user A accesses B’s resource by changing ID in path/body.

·         Transport: HTTPS only, HSTS, secure cookies. Negative: HTTP downgrade, mixed content, cookie flags.

·         Logging: No PII/secrets in logs; errors don’t leak stack to client. Negative: logs contain passwords, tokens, full cards; verbose errors to user.

·         Misconfig: Secure defaults, no debug in prod. Negative: default creds, debug endpoints exposed, permissive CORS.

Questions on Ambiguities

·         What roles/tenants exist, and what is the exact access matrix (who can read/write what)?

·         Are error messages intentionally generic for unauthenticated/unauthorized, or is some detail allowed for debugging?

·         What input length/type limits are enforced (and where: gateway, app, DB)?

·         Is CSRF protection applied to all state-changing operations, and is it token-based or SameSite-only?

·         Which headers (e.g. X-Forwarded-For, Host) are trusted for routing or logging, and could they be spoofed?

·         Are file uploads restricted by type/size and scanned; where are they stored and how are they served?

·         What secrets (API keys, DB URLs) are in config/env, and are they ever logged or exposed in errors?

Areas Where Test Ideas Might Be Missed

·         Business logic : rate limits, coupon reuse, price manipulation, workflow bypass (e.g. skip payment step).

·         Second-order injection: data stored from one request and executed in another context (e.g. stored XSS, stored SQLi in report generation).

·         Subdomain/redirect : open redirects, subdomain takeover, cookie scope too broad.

·         Dependency vulns: outdated libs with known CVEs; tests don’t cover “what if this dependency is malicious”.

·         Auth bypass via alternate paths: GraphQL introspection, internal APIs, webhooks without signature verification.

·         Time-based and conditional attacks: timing side channels, race conditions on balance/eligibility checks.

·         Client-side only checks: relying on front-end validation without server-side enforcement.

Output Template

Context: [app/endpoint under test, entry points, dependencies, environment]

Assumptions: [auth method (JWT/OAuth/session), tenant model, WAF/gateway in front]

Test Types: [security, authN/authZ]

Test Cases:

ID: [TC-SEC-001]

Type: [security]

Title: [short name]

Preconditions/Setup: [data, env, mocks, flags]

Steps: [ordered steps or request details]

Variations: [inputs/edges/negative cases]

Expected Results: [responses/UI states/metrics]

Cleanup: [teardown/reset]

Coverage notes: [gaps, out-of-scope items, risk areas]

Non-functionals: [perf targets, security considerations, accessibility notes]

Data/fixtures: [test users, payloads, seeds]

Environments: [dev/stage/prod-parity requirements]

Ambiguity Questions:

- [Question 1 about unclear behavior]

- [Question 2 about edge case]

Potential Missed Ideas:

- [Suspicious area where tests might still be thin]

AI in Software Testing: How Artificial Intelligence Is Transforming QA

For years, software testing has lived under pressure: more features, faster releases, fewer bugs, smaller teams. Traditional QA has done her...