AWS Machine Learning

AWS offers ML services across the spectrum — from pre-trained AI APIs ( Rekognition, Comprehend, Polly) that require zero ML expertise, to SageMaker for building custom models, to Bedrock for foundation models and LLM applications.

Service Map

ServiceTypeUse Case
AI ServicesPre-trained APIsVision, NLP, speech, document, contact center
BedrockFoundation ModelsLLMs, RAG, agents, image generation
SageMakerML PlatformBuild, train, deploy custom models
SageMaker CanvasNo-Code MLBusiness analyst predictions
RekognitionVision AIImage/video analysis, face comparison
ComprehendNLP AIText extraction, sentiment, entities, topics

ML Stack

Pre-built AI APIs (AI Services)
  │ rekognition, comprehend, polly, translate, transcribe, textract, lex, kendra
  │ Zero ML expertise needed. Pay per use. Easy API access.
  ▼
Foundation Models (Bedrock)
  │ Claude, Llama, Mistral, Stable Diffusion, Titan
  │ Pre-trained on massive data. Fine-tune or RAG. API access.
  ▼
ML Platform (SageMaker)
  │ Jupyter, training, inference, edge deployment
  │ Full control. Data scientists. Bring your own model.
  ▼
ML Infrastructure
  │ EC2 GPU instances, EFA networking, Trainium/Inferentia chips
  │ Raw compute. When you need maximum control.

Choosing an ML Approach

Do you need to build a custom model?
  │
  ├── NO (use existing model)
  │   ├── Pre-built API (AI Services) → Rekognition, Comprehend, Polly, etc.
  │   └── Foundation Model (Bedrock) → Claude, Llama, Mistral, Stable Diffusion
  │
  └── YES (build custom)
      ├── No-code (SageMaker Canvas) → Business analysts
      └── Full platform (SageMaker) → Data scientists

Security and Compliance

ConsiderationImplementation
Data residencySageMaker processing jobs run in your VPC
Model ownershipYou own your models and data
EncryptionKMS for models at rest, TLS in transit
Access controlIAM for API access, SageMaker for notebook access
AuditCloudTrail for API calls, SageMaker for training jobs
ComplianceHIPAA, GDPR, FedRAMP (varies by service)

Cost Optimization

StrategyHow
Spot instancesTraining jobs: 60-70% savings
Managed spotSageMaker managed spot: MaxRuntimeInSeconds
Inference endpointsAuto-scaling + GPU switching (P4 → T4)
Multi-model endpointsDeploy 100s of models on one endpoint
Serverless inferenceSageMaker Serverless: pay per call
AI ServicesPay per API call, no idle cost

References

Nuggets & Gotchas

  • AI Services (Rekognition, Comprehend, etc.) are NOT HIPAA-eligible by default — you need a Business Associate Addendum (BAA): If you’re processing PHI, use Bedrock (HIPAA eligible) or SageMaker with your own models. Always verify compliance requirements before using AI Services with health data.
  • SageMaker training jobs run on managed infrastructure — your data IS processed by AWS: Even though SageMaker runs in your VPC, the training compute is managed by AWS. For maximum data isolation, use VPC-only endpoints and SageMaker Direct.
  • Bedrock’s data processing varies by model provider — Anthropic, Meta, Mistral have different data policies: Before using Bedrock for sensitive data, read the model provider’s data policy. Some models train on input data (opt-out available).
  • AI Services pricing is per API call — at scale, costs add up fast: Rekognition at 12,000/month. Budget carefully before deploying AI Services at production scale.
  • SageMaker Canvas produces models but doesn’t give you the model artifact — you’re locked into Canvas predictions: If you need to deploy the model elsewhere (edge, mobile), use SageMaker Pipelines to export the model or use the built-in model registry.