Prisma Cloud from Palo Alto Networks is sponsoring our coverage of AWS re:Invent 2021.
This guide is the last part of a series covering the Amazon SageMaker Studio Lab.
As we mentioned in previous posts, Amazon SageMaker Studio Lab is a standalone service that allows users to experiment with building machine learning models. It has no dependencies on Amazon Web Services itself. The environment is based on the popular and familiar JupyterLab notebooks. JupyterLab is the only commonality between Studio Lab and Studio available from the AWS Console. Anyone with an email account can sign up for the service.
The service is completely free. Amazon has opened up an IDE and environment for building machine learning models with no strings attached. This may be the first AWS service that lives outside of the IAM realm with an infinite number of free tier hours.
Except for the branding, the service has almost nothing to do with SageMaker.
In previous posts, we explored SageMaker Studio Lab basics and the SageMaker Serverless Inference. This tutorial will take the next step, and will show how to publish serverless inference endpoints for TensorFlow models.
When you have a model trained within SageMaker Studio Lab or any other environment, you can host that model within the SageMaker Studio environment for inference at scale. If you have followed the steps to train the image classification model based on the cats vs. dogs dataset, you can extend the scenario to deploy the same model within the SageMaker Serverless Inference service.
Prerequisites
You need the following to complete this tutorial:
- AWS account
- Access Key and Secret Key of your AWS account
- SageMaker Execution Role
Follow the steps mentioned in the Amazon SageMaker documentation to create the SageMaker IAM role with the appropriate permissions required to deploy the model.
Step 1: Preparing the Environment
Amazon SageMaker Studio Lab comes with the AWS CLI, which can be used to configure the environment. For this tutorial, we will use the Jupyter notebook and AWS SDK for Python (Boto3) to configure the credentials expected by the SDK.
Run the below commands in a new notebook based on the tf2:python
kernel created in the previous tutorial.
1 2 3 4 5 6 7 | !mkdir –p ~/.aws/ #Replace with your keys %%writefile ~/.aws/credentials [default] aws_access_key_id = AWS_ACCESS_KEY aws_secret_access_key = AWS_SECRET_KEY |
Don’t forget to replace the credentials with your own keys.
1 2 3 4 | %%writefile ~/.aws/config [default] region=eu–west–1 |
These commands configure the AWS environment expected by Boto3.
Let’s prepare the model by archiving it into a tarball. This will be later uploaded to an Amazon S3 bucket for registering it with SageMaker.
1 2 3 4 | import tarfile model_archive = ‘../model/model.tar.gz’ with tarfile.open(model_archive, mode=‘w:gz’) as archive: archive.add(‘../model/export’, recursive=True) |
Finally, set the variables used to configure the inference endpoints.
1 2 3 | region=‘eu-west-1’ sagemaker_role = SAGEMAKER_ROLE_ARN container = “763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:2.6.0-cpu-py38-ubuntu20.04” |
Don’t forget to replace SAGEMAKER_ROLE_ARN
with the ARN created as a part of the prerequisites. We are setting the AWS region to Dublin. Feel free to replace it with any of the supported regions of serverless inference feature. The last line points to the container image that will be used by SageMaker during the creation and registration of the model. The TensorFlow Saved model will be mounted within this container that already has the code for inference. In case you choose a different region other than eu-west-1
, update the image appropriately. You can access the list of available images here.
Step 2: Creating Amazon SageMaker Model
In this step, we will upload the model tarball to an S3 bucket and associate it with the deep learning container image for inference.
1 2 3 4 5 6 7 8 9 10 11 12 | import boto3 import sagemaker from sagemaker import Session region = boto3.Session().region_name sess = Session() bucket = sess.default_bucket() client = boto3.client(“sagemaker”, region_name=region) model_url = sess.upload_data(path=model_archive, key_prefix=‘model’) model_name = “dogs-vs-cats” response = client.create_model( ModelName = model_name, ExecutionRoleArn = sagemaker_role, Containers = [{ “Image”: container, “Mode”: “SingleModel”, “ModelDataUrl”: model_url, }] ) |
The last code snippet has everything SageMaker needs to create a model with the name dogs-vs-cats
.
If you access the S3 bucket used by Amazon SageMaker, you will find the model tarball.
If you navigate to the models section of SageMaker in AWS Console, you will see the model registered with it.
Step 3: Defining SageMaker Serverless Inference Endpoint Configuration
This is the most crucial step where we configure the endpoint for serverless inference.
1 2 3 4 5 6 7 8 9 10 11 12 13 | response = client.create_endpoint_config( EndpointConfigName=“dogs-vs-cats”, ProductionVariants=[ { “ModelName”: “dogs-vs-cats”, “VariantName”: “AllTraffic”, “ServerlessConfig”: { “MemorySizeInMB”: 2048, “MaxConcurrency”: 20 } } ] ) |
The ServerlessConfig
attribute is a hint to SageMaker runtime to provision serverless compute resources that are autoscaled based on the parameters — 2GB RAM and 20 concurrent invocations.
When you finish executing this, you can spot the same in AWS Console.
Step 4: Creating the Serverless Inference Endpoint
We are ready to create the endpoint based on the configuration defined in the previous step.
1 2 3 4 | response = client.create_endpoint( EndpointName=“dogs-vs-cats”, EndpointConfigName=“dogs-vs-cats” ) |
This results in the final inference endpoint being ready to accept requests.
Step 5: Invoking the Serverless Inference Endpoint
Let’s go ahead and test the endpoint by sending the images of a dog.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | results={ 0:‘cat’, 1:‘dog’ } Image_Width=128 Image_Height=128 Image_Size=(Image_Width,Image_Height) Image_Channels=3 from PIL import Image import numpy as np img_file=“../images/image2.jpg” im=Image.open(img_file) im=im.resize(Image_Size) im=np.expand_dims(im,axis=0) im=np.array(im) im=im/255 import boto3 import json runtime = boto3.client(“sagemaker-runtime”) endpoint_name = “dogs-vs-cats” content_type = “application/json” payload = json.dumps({“instances”: im.tolist()}) response = runtime.invoke_endpoint( EndpointName=endpoint_name, ContentType=content_type, Body=payload ) pred=json.load(response[‘Body’]) results[np.argmax(pred[‘predictions’])] |
You should see the endpoint classifying the image correctly.
This concludes the tutorial on publishing serverless inference endpoints for TensorFlow models. Hope you found it useful.
Amazon Web Services is a sponsor of InApps Technology.