All posts

CI/CD Caching Strategies That Cut Pipeline Time by 60%

GeekOnCloud··6 min read

Every minute your pipeline spends re-downloading dependencies is a minute wasted. I've watched teams accept 45-minute builds as "just how things are" while their caches sit misconfigured or completely unused. Last quarter, I cut a client's average pipeline time from 38 minutes to 14 minutes—a 63% reduction—by fixing their caching strategy. No hardware upgrades, no parallelization magic. Just proper caching.

Here's the thing: most CI/CD caching documentation tells you that you should cache, but not how to cache effectively. Let's fix that.

Understanding What Actually Needs Caching

Before touching any configuration, audit where your pipeline spends time. Run this in your CI environment:

# Add timing to each major step
time npm ci          # or pip install, bundle install, etc.
time docker build .
time npm run build
time npm test

In 90% of projects I've audited, dependency installation and Docker layer building consume 60-80% of total pipeline time. These are your primary cache targets.

The mental model is simple: cache anything that's (1) slow to generate, (2) changes infrequently, and (3) can be keyed deterministically. Dependencies fit perfectly—they change when your lockfile changes, not on every commit.

Dependency Caching Done Right

Most teams cache their node_modules or .venv directories directly. This works but creates cache bloat and invalidation issues. Cache the package manager's cache directory instead—let the package manager handle the extraction.

Here's a GitLab CI configuration that caches npm properly:

variables:
  npm_config_cache: "$CI_PROJECT_DIR/.npm"

cache:
  key:
    files:
      - package-lock.json
  paths:
    - .npm/
  policy: pull-push

install_dependencies:
  stage: setup
  script:
    - npm ci --prefer-offline
  cache:
    policy: pull-push

build:
  stage: build
  script:
    - npm run build
  cache:
    policy: pull  # Don't update cache, just read it

Key details that matter:

  1. npm ci over npm install: ci is deterministic and faster—it doesn't modify your lockfile.
  2. --prefer-offline: Uses cached packages without checking the registry when possible.
  3. Cache policy separation: Only the install job pushes to cache. Build and test jobs pull only. This prevents cache corruption from parallel jobs.
  4. File-based cache key: The cache invalidates only when package-lock.json changes, not on every commit.

For Python projects, cache ~/.cache/pip with a key based on requirements.txt or poetry.lock. For Ruby, cache ~/.bundle keyed on Gemfile.lock.

Docker Layer Caching: The Biggest Win You're Ignoring

Docker builds in CI are notoriously slow because, by default, every build starts cold. You're re-downloading base images and re-running every RUN instruction from scratch. BuildKit's cache mounts and registry-based caching change this entirely.

Here's a GitHub Actions workflow with proper Docker caching:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        
      - name: Login to Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
          
      - name: Build and Push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          cache-from: type=registry,ref=ghcr.io/${{ github.repository }}:buildcache
          cache-to: type=registry,ref=ghcr.io/${{ github.repository }}:buildcache,mode=max

The mode=max flag is crucial—it caches all layers, not just the final image layers. Without it, intermediate build stages aren't cached.

For even better results, structure your Dockerfile to maximize cache hits:

FROM node:20-alpine AS builder

WORKDIR /app

# Copy dependency files first - these change less frequently
COPY package*.json ./
RUN npm ci --only=production

# Copy source code last - this changes on every commit
COPY . .
RUN npm run build

This ordering means your npm ci layer stays cached unless dependencies change, even when source code changes every commit.

Cache Storage: Local vs. Distributed

CI providers offer different caching backends with vastly different performance characteristics:

  • GitHub Actions: Cache stored in Azure Blob, 10GB limit per repo, ~30-60 seconds to restore large caches
  • GitLab CI: Distributed cache via S3/GCS/MinIO, configurable limits
  • CircleCI: ~15-second restore times with their storage

For GitHub Actions, cache restore time becomes a bottleneck with large caches. I've seen teams cache 2GB of node_modules and spend 45 seconds restoring it—sometimes longer than a fresh npm ci would take. Measure your restore times:

- name: Restore cache with timing
  id: cache-restore
  uses: actions/cache@v4
  with:
    path: ~/.npm
    key: npm-${{ hashFiles('**/package-lock.json') }}
    
- name: Report cache performance
  run: |
    echo "Cache hit: ${{ steps.cache-restore.outputs.cache-hit }}"
    echo "Restored at: $(date)"

If restore times exceed 30 seconds, consider splitting caches by directory or using the actions/cache/restore and actions/cache/save actions separately for more control.

Cache Invalidation Strategies That Don't Break

The hardest problem in caching is invalidation. Too aggressive, and you never hit the cache. Too lenient, and you ship bugs from stale dependencies.

My baseline strategy uses three cache key tiers:

- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      npm-${{ runner.os }}-

The restore-keys fallback means a partial cache hit still provides value. If your lockfile changed slightly, you restore the old cache and only download the delta.

For scheduled cache rotation (preventing months-old stale caches), add a date component:

key: npm-${{ runner.os }}-week${{ steps.date.outputs.week }}-${{ hashFiles('**/package-lock.json') }}

This forces a full cache refresh weekly while maintaining cache hits within each week.

Measuring Your Gains

Don't optimize blindly. Track these metrics before and after implementing caching:

  1. P50 and P95 pipeline duration: Median shows typical experience, P95 catches cache miss impact
  2. Cache hit rate: Below 80% means your cache keys are too specific
  3. Cache restore time vs. fresh install time: If restore approaches fresh install time, your cache is too large

In GitHub Actions, extract these from the workflow run API. In GitLab, the CI/CD Analytics dashboard shows pipeline duration trends.

One team I worked with discovered their cache hit rate was 95% on feature branches but 0% on main—they'd configured branch-specific cache keys. Removing the branch component from their cache key immediately improved main branch builds by 8 minutes.

What To Do Monday Morning

Pull your last 20 pipeline runs and calculate average duration. Identify your two slowest steps—I guarantee one is dependency installation or Docker build. Implement the caching patterns above for those specific steps. Push the change and compare your next 20 runs.

Targeting a 60% reduction isn't aggressive. With proper caching of dependencies and Docker layers, I consistently see 50-70% improvements. The pipelines that don't improve usually have other bottlenecks—test parallelization, network-bound operations, or genuinely compute-heavy builds that can't be cached.

Start with one cache, measure, iterate. Your future self waiting for green builds will thank you.

Written by GeekOnCloud

DevOps & Infrastructure engineer at geekoncloud.com

Read more posts →