Skills Operations
Deployment, secret-readiness, collector invocation, smoke-check, and rollback runbook.
Skills Operations
Last updated: 2026-05-22
Use this runbook to deploy SDMCC Skills, confirm data collection readiness, and avoid leaking secret material. Current operating scope is development hosting at skills.niaidcivics-dev.org with collector and recommendation schedules disabled until an approved Azure DevOps PAT secret is installed. The reserved production hostname is skills.niaidcivics.org, but production remains inactive until the team explicitly approves it.
Safety Rules
- Never print PATs, passwords, Cognito tokens,
.envvalues, or raw deployment logs that could contain credentials. - Treat AWS Secrets Manager values as secret-bearing even when checking whether a secret is ready.
- Enable collector schedules only when the target environment has an approved Azure DevOps PAT secret.
- Do not deploy production unless the team explicitly activates a production Skills environment.
Safe Secret Readiness Check
This pattern checks whether a secret still has the scaffold replacement marker without printing the secret:
aws secretsmanager get-secret-value \
--secret-id "$SKILLS_PAT_SECRET_ID" \
--query SecretString \
--output text |
node -e 'let input="";process.stdin.on("data",(c)=>input+=c);process.stdin.on("end",()=>{process.stdout.write(input.includes("REPLACE_WITH_APPROVED_AZURE_DEVOPS_PAT")?"placeholder\n":"non-placeholder\n")})'
Use AWS_PROFILE and AWS_REGION for the intended SDMCC account before running the check.
First Development Deploy
A safe scaffold deployment can omit AZURE_DEVOPS_PAT_SECRET_ARN; the deploy helper creates or reuses a placeholder secret and deploys with collector and recommendation schedules disabled. For skills-sdmcc-dev, the deploy helper defaults to skills.niaidcivics-dev.org and resolves the matching public Route 53 hosted zone automatically.
AWS_PROFILE=SDMCC-DEV-New \
AWS_REGION=us-east-1 \
STACK_NAME=skills-sdmcc-dev \
ENVIRONMENT_NAME=skills-sdmcc \
scripts/deploy-skills-aws.sh
To override the development hostname, set DASHBOARD_DOMAIN_NAME; the hosted zone ID can normally be omitted when the domain is under a public Route 53 hosted zone in the target account:
AWS_PROFILE=SDMCC-DEV-New \
AWS_REGION=us-east-1 \
STACK_NAME=skills-sdmcc-dev \
ENVIRONMENT_NAME=skills-sdmcc \
COGNITO_DOMAIN_PREFIX=skills-sdmcc-dev-<aws-account-id> \
DASHBOARD_DOMAIN_NAME=skills.niaidcivics-dev.org \
scripts/deploy-skills-aws.sh
For a temporary emergency test stack without the meaningful hostname, set DASHBOARD_DOMAIN_NAME=none.
To enable collector schedules, first create an approved PAT secret outside this repo, verify that it is not the placeholder, then pass its ARN:
AWS_PROFILE=SDMCC-DEV-New \
AWS_REGION=us-east-1 \
STACK_NAME=skills-sdmcc-dev \
ENVIRONMENT_NAME=skills-sdmcc \
AZURE_DEVOPS_PAT_SECRET_ARN="$SDMCC_SKILLS_PAT_SECRET_ARN" \
scripts/deploy-skills-aws.sh
Do not pass PAT values directly to the script.
Production Deploy
Do not deploy production unless the team explicitly activates a production Skills environment. When approved, use the production account and stack name so the deploy helper defaults to the reserved production hostname, skills.niaidcivics.org.
AWS_PROFILE=SDMCC-PROD-New \
AWS_REGION=us-east-1 \
STACK_NAME=skills-sdmcc-prod \
ENVIRONMENT_NAME=skills-sdmcc-prod \
scripts/deploy-skills-aws.sh
Manual Collector Run
After a development deploy with an approved secret, invoke the collector once before trusting the schedule:
aws lambda invoke \
--function-name "$SKILLS_COLLECTOR_FUNCTION_NAME" \
--payload '{}' \
--cli-binary-format raw-in-base64-out \
/tmp/skills-collector-result.json
Then run the recommendation function:
aws lambda invoke \
--function-name "$SKILLS_RECOMMENDATION_FUNCTION_NAME" \
--payload '{}' \
--cli-binary-format raw-in-base64-out \
/tmp/skills-recommendation-result.json
The Lambda responses should contain counts and status fields only. If an invocation fails because of source access or secret readiness, disable schedules before finishing the ship.
Smoke Checks
Run these checks after each local build:
npm run site:check
npm run site:dev
curl -fsS http://127.0.0.1:4173/site/data/skills.json
Run these checks after each hosted deploy:
curl -fsSI "$SKILLS_DASHBOARD_URL/"
curl -fsS "$SKILLS_DASHBOARD_URL/data/skills.json"
For development, SKILLS_DASHBOARD_URL should be https://skills.niaidcivics-dev.org. When the CloudFront default distribution domain is known, confirm it redirects to the meaningful hostname:
curl -sI "https://$SKILLS_CLOUDFRONT_DOMAIN/index.html?host-check=1" | sed -n '1,8p'
Expected result: 301 with a Location header on the configured SDMCC dashboard hostname.
Rollback
- Static-site rollback: redeploy from the previous merge commit and invalidate CloudFront.
- Collector safety rollback: redeploy without
AZURE_DEVOPS_PAT_SECRET_ARN, or setCOLLECTOR_SCHEDULE_STATE=DISABLEDandRECOMMENDATION_SCHEDULE_STATE=DISABLED. - CloudFront hostname issue: remove the custom-domain parameters only for an emergency test stack; the active development hostname should normally stay on the meaningful URL once approved.
Record every deploy, rollback, or schedule-state change in Skills Status.