Data governance

Analytics Data Governance

Field-level privacy policy, retention targets, redaction rules, and LLM recommendation boundaries.

Analytics Data Governance

Skills analytics uses a strict-minimum data policy. The dashboard should be useful for team workflow improvement without storing secrets, raw prompts, raw transcripts, local machine details, deployment logs, work item descriptions, or full Azure Boards comments.

See Agent Analytics Metadata Contract for parser rules and Skills Metric Dictionary for dashboard metric meanings.

Classification Model

Classification	Meaning	Handling
Allowed	Safe enough for MVP analytics when sourced from metadata or narrow Azure Boards fields.	May be stored in normalized records and returned by authorized dashboard APIs.
Controlled	Useful, but may need role-aware display, retention limits, or additional review.	Store only when explicitly listed and keep out of public surfaces.
Derived	Calculated from allowed/controlled fields.	Store or return with source evidence and no raw sensitive inputs.
Prohibited	Sensitive, high-risk, or too broad for the MVP purpose.	Reject, omit, redact, or keep transient only long enough to parse safely.

Field-Level Policy

Field Or Data	Classification	Storage Rule	Notes
`schema`, `eventType`, `occurredAt`	Allowed	Store in normalized records.	Required for parser and metric grouping.
`workItemId`, `workItemType`, `parentId`, `state`	Allowed	Store in normalized records and dashboard APIs.	IDs and workflow state are core dashboard evidence.
Azure Boards work item URL	Allowed	Store normalized URL only.	Do not include auth tokens or query strings with credentials.
Work item title	Controlled	Allowed for MVP signed-in dashboard roles; do not show on public pages.	If a project later treats titles as sensitive, mask by role or use title hashes.
Work item description, acceptance criteria, full discussion comment	Prohibited	Do not store in metadata, normalized records, logs, or LLM inputs.	The source adapter may hold comments transiently in memory only to extract marked metadata blocks.
`skillName`, `skillVersion`, `agentHost`, `model`	Allowed	Store in normalized records.	Unknown/unsupported values should be `unknown`, not inferred.
Skill field-report `skillId`, `category`, `rating`, timestamp, and source label	Allowed	Store in governed field-report records and authenticated summary APIs.	Categories are limited to the approved list in Skill Field Reports.
Skill field-report comment	Controlled	Store only a short sanitized comment. Keep it out of public surfaces.	No raw prompts, raw transcripts, raw logs, credentials, full Azure Boards comments, or work item text.
`actorKind`, `createdByKind`, `completedByKind`, `attributionKind`, confidence, reason	Allowed	Store normalized attribution fields.	Attribution should stay conservative when evidence is incomplete.
`tokenEstimate`	Controlled	Store only best-effort estimates with `isBillingRecord=false`.	Unavailable estimates must omit token counts. Never treat as invoices or billing records.
Cost estimate	Derived	Store only after a versioned pricing table is approved.	Cost is directional and must cite pricing version/date.
`summary`	Controlled	Allow short sanitized summaries up to the parser contract limit.	No raw prompt text, transcript text, description excerpts, or secrets.
`source.commentId`, `commentCreatedAt`, parser offset	Allowed	Store safe source pointers.	Enables audit/debugging without raw comment content.
Parse failure code and message	Allowed	Store safe code/message only.	Do not include raw invalid JSON or surrounding comment text.
Azure Boards user display name, email, descriptor	Controlled	Omit from MVP analytics records.	If later needed, use approved mapping or a non-reversible project-scoped hash.
Cognito username, email, groups, ID token claims	Controlled	Use for API authorization; do not copy into analytics events.	Admin audit trails are separate from analytics records.
PATs, passwords, API keys, bearer tokens, client secrets, auth headers	Prohibited	Reject/omit and never log.	This includes `.env*`, `.mcp.json`, local auth files, and deployment output that may contain credentials.
Raw prompts, raw transcripts, session logs, meeting recordings	Prohibited	Do not store or send to recommendation jobs.	Token counters may be read locally, but raw session content must not be persisted.
Local filesystem paths, machine names, OS/user profile details	Prohibited	Do not store.	Host support should be represented by `agentHost`, not local paths.
Deployment logs	Prohibited	Do not store in analytics.	Deployment evidence can be summarized in tickets without raw log content.
Recommendation title, summary, evidence metrics, confidence	Derived	Store recommendation records when generated from normalized/aggregate evidence.	Recommendations must not quote raw comments, prompts, or descriptions.

User Identity Rule

MVP analytics should not store direct human user identifiers in event records. Azure Boards identities may be used transiently for conservative attribution, and Cognito claims may be used for authorization, but dashboard analytics records should use actor categories such as agent, human, mixed, and unknown.

If future metrics require user-level reporting, add a separate design decision before implementation. The default future path is an approved mapping table or a project-scoped non-reversible hash, not raw email addresses in analytics rows.

Retention Targets

These targets are policy inputs for storage, API, collector, and operations work. If infrastructure cannot enforce one yet, track the enforcement gap before production launch.

Data Set	Target Retention	Notes
Normalized analytics records in DynamoDB	18 months	Long enough for trend review across releases without becoming broad history.
Normalized accepted-record archive in S3	180 days	Archive stores normalized accepted records, not raw Azure Boards comments.
Collector run health records	180 days	Needed for freshness and operational trend checks.
Parse failure records	90 days	Store safe codes/messages only; use for cleanup and training.
CloudWatch Lambda/API logs	30 days	Existing infrastructure uses short operational log retention. Logs must avoid raw comments and credentials.
Stored recommendation records	18 months	Keep generated guidance and evidence metrics with the same trend horizon as analytics.
LLM recommendation request payloads	Transient only	Do not persist raw request bodies unless they contain only approved aggregate evidence and have explicit retention.
Sanitized local fixtures and docs examples	Repository lifetime	Fixtures must remain sanitized and credential-free.

Redaction Rules

Parser failures may include work item ID, work item type, comment ID, safe error code, safe message, and parser offset.
Parser failures must not include raw comment text, invalid JSON payloads, prompt text, transcript text, descriptions, credentials, or surrounding discussion content.
Collector/API logs should report counts, identifiers, and safe error names.
Any error path that catches an upstream exception must mask credential material before returning or logging it.
Linear, PR, and deployment evidence should summarize validation and smoke checks without pasting raw secret-bearing logs.

Title Visibility Decision

For MVP, work item titles are acceptable on authenticated Skills dashboard surfaces for the configured Cognito groups. Public library and documentation pages must not expose live Azure Boards titles. Role-based title masking is not required for MVP, but the API should keep title handling centralized so a future project can mask or hash titles for narrower roles.

LLM Recommendation Boundaries

Recommendation jobs may use aggregate counts, normalized event fields, safe work item IDs/links, sanitized summaries, token estimate availability, parse failure codes, and collector run metrics. They must not use raw Azure Boards comments, work item descriptions, raw prompts, transcripts, local paths, credentials, deployment logs, or direct user identifiers.

Recommendation outputs must cite dashboard evidence rather than source-sensitive text. Suggested process changes remain recommendations for human review; the system must not automatically rewrite skills, board states, or team workflows from LLM output.

Guidance For Authors And Operators

Skill authors should write only valid agent-analytics:v1 blocks and use npm run analytics:docs when changing published examples.
Collector changes should preserve the transient raw-comment boundary: parse in memory, then discard raw source text.
Dashboard/API changes should whitelist response fields rather than return stored objects wholesale.
Storage changes should implement retention using DynamoDB TTL, S3 lifecycle, or an explicit cleanup job before production launch.
Operators should treat .env*, auth files, deployment logs, PATs, Cognito tokens, and AWS credentials as secret-bearing even when troubleshooting analytics.