Just before AWS Re:invent 2021, AWS announced Pull Through Cache Repositories for Amazon Elastic Container Registry. This new feature allows you to keep your ECR registry in sync with the upstream registry. It’s important to note that there is only support for upstream repositories hosted on Quay.io and ECR Public. The most popular registry Docker Hub isn’t supported but there is a way to work around this problem. Another recent announcement confirmed that Docker Official images are available on ECR Public. This means you should be able to use this new “Pull Through Cache” feature for Docker official images. Your flow would look like this:

ECR Pull Through Cache → ECR Public → Docker Hub (official images)

These two new features can solve the following issues:

  • There is no rate limit on ECR Public, so you can pull Docker official images as often as you need without the Docker Hub rate limits. (They are rate limited if not authenticated with an Amazon account).

  • ECR Public is replicated across all AWS regions, so pulls are local to the region you pull from, which reduces latency.

  • By using Pull Through Cache you don’t have to worry about keeping images in sync.

Now let’s test this new “Pull Through Cache” feature. The first test case is pretty basic. I’m using an EC2 instance on which I’ve installed Docker and the instance is deployed in a private subnet which is connected to a NAT Gateway. I’ll pull the docker/library/alpine:latest Docker image.

architecture

The instance role has the AmazonEC2ContainerRegistryFullAccess and AmazonElasticContainerRegistryPublicFullAccess policies attached. I think there is some room for improvement here. Pulling images wasn’t working with the PowerUser role.

iam roles

Now I’ll create the following “Pull Through Cache” rule:

pull through cache rule
aws console

I connect to my EC2 instance (deployed in a private subnet) and I’ll pull the docker/library/alpine:latest image:

$ aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 123456789101.dkr.ecr.eu-west-1.amazonaws.com/ecr-public

$ docker pull 123456789101.dkr.ecr.eu-west-1.amazonaws.com/ecr-public/docker/library/alpine:latest
latest: Pulling from ecr-public/docker/library/alpine
59bf1c3509f3: Pull complete
Digest: sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300
Status: Downloaded newer image for 123456789101.dkr.ecr.eu-west-1.amazonaws.com/ecr-public/docker/library/alpine:latest
123456789101.dkr.ecr.eu-west-1.amazonaws.com/ecr-public/docker/library/alpine:latest
ecr repository

The image is pulled. That was easy. The “Pull Through Cache” feature will sync future versions to our repository. Now let me verify the connection to ECR. I’m using a route through a NAT Gateway so the connection should go over the public internet.

nslookup 123456789101.dkr.ecr.eu-west-1.amazonaws.com
...
Address: 63.33.82.70

63.33.82.70 represents the public IP of my ECR registry.

Now for the second case I will remove the NAT gateway and corresponding routes. I will also configure the following VPC endpoints:

  • ecr.api (Interface endpoint)
  • ecr.dkr (Interface endpoint)
  • s3 (Gateway endpoint)
architecture

These endpoints are connected to my VPC. After configuring the security groups I’m able to establish a private connection from my EC2 to ECR.

Now I’ll pull a new image so I can be sure no cache is being used.

$ docker pull 123456789101.dkr.ecr.eu-west-1.amazonaws.com/ecr-public/docker/library/node:lts
lts: Pulling from ecr-public/docker/library/node
c4cc477c22ba: Pulling fs layer
077c54d048f1: Pulling fs layer
0368544993b2: Pulling fs layer
dd9d1af71976: Waiting
eb8ae1e274f4: Waiting
d77ded8fde28: Waiting
7e078f8be34b: Waiting
eaf05a96d7a3: Waiting
701ddb919de1: Waiting
error pulling image configuration: Get "https://d2glxqk2uabbnd.cloudfront.net/78e9cb-903773873662-febe5bc4-51ed-c6fb-e028-9449267842e2/e6e9695a-7d27-44bc-8bbe-f6aa12aa29d1?CallerAccountId=400469424169&Expires=1639329365&Signature=DIiMLgpQuPHYXkfaOswdB0Y90RNx1oUCYqtptqx-vSevE7-l7YHlEFxs~--Ila~ihIPWz-qYEkTPk0MBqgRFDYdKbt5sveRSfoXSx2KNFtMFzl0MFS3NecNSKs-e9OfQXiaAfMcBn8vv6He2Zr3lFMzf8KKDxALEL3bYeVXvJQuWXA901HXPm44~O3Iq8BDrcxXc7PKfUjkS9XBtZ-HQY9EYGjxyPpck3fFMNNoWFfLz6H2-DKM1D6PplY9CvbYAhTmHwpDDls2Tw8k7WL3tY9bULBX25gBR4WptdeMLFXzneWWSpPkriMHlFqTLqQVkjGlFeZSG2coUQBVswi0mMw__&Key-Pair-Id=KSPLJWGOARK62": 
dial tcp 13.224.227.216:443: i/o timeout

The pull command times out. I see the new repository is created but the image is not there yet:

ecr node

After some time I see that the image appears. ECR was still pulling the image in the background.

ecr node all pulled

Now the image is available in my ECR registry, I can retry to pull it.

docker pull 123456789101.dkr.ecr.eu-west-1.amazonaws.com/ecr-public/docker/library/node:lts
lts: Pulling from ecr-public/docker/library/node
c4cc477c22ba: Pull complete
077c54d048f1: Pull complete
0368544993b2: Pull complete
dd9d1af71976: Pull complete
eb8ae1e274f4: Pull complete
d77ded8fde28: Pull complete
7e078f8be34b: Pull complete
eaf05a96d7a3: Pull complete
701ddb919de1: Pull complete
Digest: sha256:89b59ce49929d8a8e230946bdb1b58c14cdbbb86c9a7397610afcecfce1be035
Status: Downloaded newer image for 123456789101.dkr.ecr.eu-west-1.amazonaws.com/ecr-public/docker/library/node:lts
123456789101.dkr.ecr.eu-west-1.amazonaws.com/ecr-public/docker/library/node:lts

Now the pull works! I was a little surprised by this behaviour but it’s described in the docs. Also check the other considerations.

When an image is pulled using a pull through cache rule for the first time, if you’ve configured Amazon ECR to use an interface VPC endpoint using AWS PrivateLink then you need to create a public subnet in the same VPC, with a NAT gateway, and then route all outbound traffic to the internet from their private subnet to the NAT gateway in order for the pull to work. Subsequent image pulls don’t require this.

The reason behind this behaviour is described here. During my testing I discovered that if the image is small enough, it can work from the first time. Only bigger images seem to result in this issue.

Many container orchestration tools will retry image pulls so this shouldn’t be a big issue. Still it would be nice if the very first pull would always work.

To conclude, I’m pretty happy with these improvements. There is no need anymore to create a mechanism which can keep your images in sync.
Still there is room for some improvements:

  • Support (all?) Docker Hub images, not only the official ones
  • Support additional registries
  • Make the initial image pull succeed when you’re using a private connection to ECR
  • CloudFormation support
  • Support “Pull Through Cache” for private repositories (authentication)
  • Cached images are only checked once per 24 hours
  • Make clear which policies are needed + Update AWS managed policies

I have no doubt that AWS will make some improvements soon. I hope you enjoyed it!

buy me a coffee