Analytics Data Governance
Field-level privacy policy, retention targets, redaction rules, and LLM recommendation boundaries.
Analytics Data Governance
Skills analytics uses a strict-minimum data policy. The dashboard should be useful for team workflow improvement without storing secrets, raw prompts, raw transcripts, local machine details, deployment logs, work item descriptions, or full Azure Boards comments.
See Agent Analytics Metadata Contract for parser rules and Skills Metric Dictionary for dashboard metric meanings.
Classification Model
| Classification | Meaning | Handling |
|---|---|---|
| Allowed | Safe enough for MVP analytics when sourced from metadata or narrow Azure Boards fields. | May be stored in normalized records and returned by authorized dashboard APIs. |
| Controlled | Useful, but may need role-aware display, retention limits, or additional review. | Store only when explicitly listed and keep out of public surfaces. |
| Derived | Calculated from allowed/controlled fields. | Store or return with source evidence and no raw sensitive inputs. |
| Prohibited | Sensitive, high-risk, or too broad for the MVP purpose. | Reject, omit, redact, or keep transient only long enough to parse safely. |
Field-Level Policy
| Field Or Data | Classification | Storage Rule | Notes |
|---|---|---|---|
schema, eventType, occurredAt | Allowed | Store in normalized records. | Required for parser and metric grouping. |
workItemId, workItemType, parentId, state | Allowed | Store in normalized records and dashboard APIs. | IDs and workflow state are core dashboard evidence. |
| Azure Boards work item URL | Allowed | Store normalized URL only. | Do not include auth tokens or query strings with credentials. |
| Work item title | Controlled | Allowed for MVP signed-in dashboard roles; do not show on public pages. | If a project later treats titles as sensitive, mask by role or use title hashes. |
| Work item description, acceptance criteria, full discussion comment | Prohibited | Do not store in metadata, normalized records, logs, or LLM inputs. | The source adapter may hold comments transiently in memory only to extract marked metadata blocks. |
skillName, skillVersion, agentHost, model | Allowed | Store in normalized records. | Unknown/unsupported values should be unknown, not inferred. |
actorKind, createdByKind, completedByKind, attributionKind, confidence, reason | Allowed | Store normalized attribution fields. | Attribution should stay conservative when evidence is incomplete. |
tokenEstimate | Controlled | Store only best-effort estimates with isBillingRecord=false. | Unavailable estimates must omit token counts. Never treat as invoices or billing records. |
| Cost estimate | Derived | Store only after a versioned pricing table is approved. | Cost is directional and must cite pricing version/date. |
summary | Controlled | Allow short sanitized summaries up to the parser contract limit. | No raw prompt text, transcript text, description excerpts, or secrets. |
source.commentId, commentCreatedAt, parser offset | Allowed | Store safe source pointers. | Enables audit/debugging without raw comment content. |
| Parse failure code and message | Allowed | Store safe code/message only. | Do not include raw invalid JSON or surrounding comment text. |
| Azure Boards user display name, email, descriptor | Controlled | Omit from MVP analytics records. | If later needed, use approved mapping or a non-reversible project-scoped hash. |
| Cognito username, email, groups, ID token claims | Controlled | Use for API authorization; do not copy into analytics events. | Admin audit trails are separate from analytics records. |
| PATs, passwords, API keys, bearer tokens, client secrets, auth headers | Prohibited | Reject/omit and never log. | This includes .env*, .mcp.json, local auth files, and deployment output that may contain credentials. |
| Raw prompts, raw transcripts, session logs, meeting recordings | Prohibited | Do not store or send to recommendation jobs. | Token counters may be read locally, but raw session content must not be persisted. |
| Local filesystem paths, machine names, OS/user profile details | Prohibited | Do not store. | Host support should be represented by agentHost, not local paths. |
| Deployment logs | Prohibited | Do not store in analytics. | Deployment evidence can be summarized in tickets without raw log content. |
| Recommendation title, summary, evidence metrics, confidence | Derived | Store recommendation records when generated from normalized/aggregate evidence. | Recommendations must not quote raw comments, prompts, or descriptions. |
User Identity Rule
MVP analytics should not store direct human user identifiers in event records. Azure Boards identities may be used transiently for conservative attribution, and Cognito claims may be used for authorization, but dashboard analytics records should use actor categories such as agent, human, mixed, and unknown.
If future metrics require user-level reporting, add a separate design decision before implementation. The default future path is an approved mapping table or a project-scoped non-reversible hash, not raw email addresses in analytics rows.
Retention Targets
These targets are policy inputs for storage, API, collector, and operations work. If infrastructure cannot enforce one yet, track the enforcement gap before production launch.
| Data Set | Target Retention | Notes |
|---|---|---|
| Normalized analytics records in DynamoDB | 18 months | Long enough for trend review across releases without becoming broad history. |
| Normalized accepted-record archive in S3 | 180 days | Archive stores normalized accepted records, not raw Azure Boards comments. |
| Collector run health records | 180 days | Needed for freshness and operational trend checks. |
| Parse failure records | 90 days | Store safe codes/messages only; use for cleanup and training. |
| CloudWatch Lambda/API logs | 30 days | Existing infrastructure uses short operational log retention. Logs must avoid raw comments and credentials. |
| Stored recommendation records | 18 months | Keep generated guidance and evidence metrics with the same trend horizon as analytics. |
| LLM recommendation request payloads | Transient only | Do not persist raw request bodies unless they contain only approved aggregate evidence and have explicit retention. |
| Sanitized local fixtures and docs examples | Repository lifetime | Fixtures must remain sanitized and credential-free. |
Redaction Rules
- Parser failures may include work item ID, work item type, comment ID, safe error code, safe message, and parser offset.
- Parser failures must not include raw comment text, invalid JSON payloads, prompt text, transcript text, descriptions, credentials, or surrounding discussion content.
- Collector/API logs should report counts, identifiers, and safe error names.
- Any error path that catches an upstream exception must mask credential material before returning or logging it.
- Linear, PR, and deployment evidence should summarize validation and smoke checks without pasting raw secret-bearing logs.
Title Visibility Decision
For MVP, work item titles are acceptable on authenticated Skills dashboard surfaces for the configured Cognito groups. Public library and documentation pages must not expose live Azure Boards titles. Role-based title masking is not required for MVP, but the API should keep title handling centralized so a future project can mask or hash titles for narrower roles.
LLM Recommendation Boundaries
Recommendation jobs may use aggregate counts, normalized event fields, safe work item IDs/links, sanitized summaries, token estimate availability, parse failure codes, and collector run metrics. They must not use raw Azure Boards comments, work item descriptions, raw prompts, transcripts, local paths, credentials, deployment logs, or direct user identifiers.
Recommendation outputs must cite dashboard evidence rather than source-sensitive text. Suggested process changes remain recommendations for human review; the system must not automatically rewrite skills, board states, or team workflows from LLM output.
Guidance For Authors And Operators
- Skill authors should write only valid
agent-analytics:v1blocks and usenpm run analytics:docswhen changing published examples. - Collector changes should preserve the transient raw-comment boundary: parse in memory, then discard raw source text.
- Dashboard/API changes should whitelist response fields rather than return stored objects wholesale.
- Storage changes should implement retention using DynamoDB TTL, S3 lifecycle, or an explicit cleanup job before production launch.
- Operators should treat
.env*, auth files, deployment logs, PATs, Cognito tokens, and AWS credentials as secret-bearing even when troubleshooting analytics.