Home
>
Data Science
>
Update Serve TensorFlow Models with KServe on Google Kubernetes Engine

March 31, 2022 by Phu Nguyen

Update Serve TensorFlow Models with KServe on Google Kubernetes Engine

Main Contents:

Serve TensorFlow Models with KServe on Google Kubernetes Engine is an article under the topic Data Science Many of you are most interested in today !! Today, let’s InApps.net learn Serve TensorFlow Models with KServe on Google Kubernetes Engine in today’s post !

Key Summary

Objective: Deploy and serve TensorFlow models using KServe on Google Kubernetes Engine (GKE) for scalable, efficient machine learning inference.
Steps:

1. 1. Set up a GKE cluster with necessary configurations (e.g., node pools, IAM roles).
  2. Install KServe using Helm charts or YAML manifests, ensuring compatibility with GKE.
  3. Prepare TensorFlow model artifacts (e.g., SavedModel format) and upload to a storage bucket (e.g., Google Cloud Storage).
  4. Create a KServe InferenceService resource to deploy the model, specifying model URI and runtime settings.
  5. Configure autoscaling and monitoring for performance optimization.

Benefits: KServe simplifies model serving with features like auto-scaling, canary rollouts, and multi-model support. GKE provides robust Kubernetes management.
Requirements: Google Cloud account, kubectl, Helm, and TensorFlow model ready for deployment.

Step 1 – Launch a GKE Cluster with T4 GPU Node

Assuming you have access to Google Cloud Platform, run the following command to launch a 3-node cluster configured to use one Nvidia T4 GPU. Replace the project, zone, and other values appropriately to reflect your environment.

<br /><br />
gcloud beta container clusters create “tns-kserve”<br /><br />
–project “janakiramm-sandbox”<br /><br />
–zone “asia-southeast1-c”<br /><br />
–no-enable-basic-auth<br /><br />
–cluster-version “1.22.4-gke.1501″<br /><br />
–machine-type “n1-standard-4″<br /><br />
–accelerator “type=nvidia-tesla-t4,count=1″<br /><br />
–num-nodes “3”<br /><br />
–image-type “UBUNTU_CONTAINERD”<br /><br />
–disk-type “pd-standard”<br /><br />
–disk-size “100”<br /><br />
–scopes “https://www.googleapis.com/auth/devstorage.read_only”,”https://www.googleapis.com/auth/logging.write”,”https://www.googleapis.com/auth/monitoring”,”https://www.googleapis.com/auth/servicecontrol”,”https://www.googleapis.com/auth/service.management.readonly”,”https://www.googleapis.com/auth/trace.append”

gcloud beta container clusters create “tns-kserve”

—project “janakiramm-sandbox”

—zone “asia-southeast1-c”

—no–enable–basic–auth

—cluster–version “1.22.4-gke.1501”

—machine–type “n1-standard-4”

—accelerator “type=nvidia-tesla-t4,count=1”

—num–nodes “3”

—image–type “UBUNTU_CONTAINERD”

—disk–type “pd-standard”

—disk–size “100”

—scopes “https://www.googleapis.com/auth/devstorage.read_only”,“https://www.googleapis.com/auth/logging.write”,“https://www.googleapis.com/auth/monitoring”,“https://www.googleapis.com/auth/servicecontrol”,“https://www.googleapis.com/auth/service.management.readonly”,“https://www.googleapis.com/auth/trace.append”

Add a cluster-admin role for the GCP user.

<br /><br />
kubectl create clusterrolebinding cluster-admin-binding<br /><br />
    –clusterrole=cluster-admin<br /><br />
    –user=$(gcloud config get-value core/account)

kubectl create clusterrolebinding cluster–admin–binding

—clusterrole=cluster–admin

—user=$(gcloud config get–value core/account)

Install the device plugin for Nvidia T4 GPU and validate that it is accessible.

<br /><br />
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml

1	kubectl apply –f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml

<br /><br />
kubectl get pods -n kube-system -l k8s-app=nvidia-gpu-device-plugin

1	kubectl get pods –n kube–system –l k8s–app=nvidia–gpu–device–plugin

Create a pod to test the access based on the Nvidia CUDA image.

<br /><br />
apiVersion: v1<br /><br />
kind: Pod<br /><br />
metadata:<br /><br />
  name: my-gpu-pod<br /><br />
spec:<br /><br />
  containers:<br /><br />
  – name: my-gpu-container<br /><br />
    image: nvidia/cuda:11.0.3-runtime-ubuntu20.04<br /><br />
    command: [“/bin/bash”, “-c”, “–“]<br /><br />
    args: [“while true; do sleep 600; done;”]<br /><br />
    resources:<br /><br />
      limits:<br /><br />
       nvidia.com/gpu: 1

apiVersion: v1

kind: Pod

metadata:

name: my–gpu–pod

spec:

containers:

– name: my–gpu–container

image: nvidia/cuda:11.0.3–runtime–ubuntu20.04

command: [“/bin/bash”, “-c”, “–“]

args: [“while true; do sleep 600; done;”]

resources:

limits:

nvidia.com/gpu: 1

<br /><br />
kubectl apply -f gpu-pod.yaml

1	kubectl apply –f gpu–pod.yaml

Run the command nvidia-smi to test GPU access

<br /><br />
kubectl exec -it my-gpu-pod — nvidia-smi

1	kubectl exec –it my–gpu–pod — nvidia–smi

With the infrastructure in place, let’s proceed with KServe installation.

Step 2 – Installing Istio

Istio is an essential prerequisite for KServe. Knative Serving relies on Istio ingress to expose KServe API endpoints. For version compatibility, check the documentation.

Download the Istio binary and your local workstation, and run the CLI for installation.

<br /><br />
curl -L https://istio.io/downloadIstio | sh -<br /><br />
istioctl install –set profile=demo -y

1 2	curl –L https://istio.io/downloadIstio \| sh – istioctl install —set profile=demo –y

Verify that all pods are in running state in the istio-system namespace.

Step 3 – Installing Knative Serving

Install Knative CRDs and core services.

<br /><br />
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-crds.yaml<br /><br />
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-core.yaml

1 2	kubectl apply –f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-crds.yaml kubectl apply –f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-core.yaml

To integrate Knative with Istio Ingress, run the below commands.

<br /><br />
kubectl apply -l knative.dev/crd-install=true -f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/istio.yaml<br /><br />
kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/istio.yaml</p><br />
<p>kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/net-istio.yaml

kubectl apply –l knative.dev/crd–install=true –f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/istio.yaml

kubectl apply –f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/istio.yaml

kubectl apply –f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/net-istio.yaml

Finally, configure the DNS for Knative that points to the sslip.io domain.

<br /><br />
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-default-domain.yaml

1	kubectl apply –f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-default-domain.yaml

Make sure that Knative Serving is successfully running.

Step 4 – Installing Certificate Manager

Install cert manager with the following command:

<br /><br />
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.yaml

1	kubectl apply –f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.yaml

Step 5 – Install KServe Model Server

We are now ready to install the KServe model server on the GKE Cluster.

<br /><br />
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.7.0/kserve.yaml

1	kubectl apply –f https://github.com/kserve/kserve/releases/download/v0.7.0/kserve.yaml

<br /><br />
kubectl get pods -n kserve

1	kubectl get pods –n kserve

KServe also installs a couple of custom resources. Check them out with the below command:

<br /><br />
kubectl get crd | grep “kserve”

1	kubectl get crd \| grep “kserve”

Step 5 – Configuring Google Cloud Storage Bucket and Uploading a TensorFlow Model

KServe can pull models from a Google Cloud Storage (GCS) Bucket to serve them for inference. Let’s create the bucket and upload the model.

We will use the model from one of my previous tutorials that trained a CNN model to classify dogs and cats for this scenario. You can download the pre-trained TensorFlow model from here. Unzip the file and run the below commands to create the GCS bucket and upload the model artifacts.

<br /><br />
gsutil mb gs://tns-kserve<br /><br />
gsutil iam ch allUsers:objectViewer gs://tns-kserve<br /><br />
gsutil cp -R model/ gs://tns-kserve

gsutil mb gs://tns-kserve

gsutil iam ch allUsers:objectViewer gs://tns-kserve

gsutil cp –R model/ gs://tns-kserve

For simplicity, we enabled public access to the bucket. But you may want to secure it and add the service account key as a secret for KServe to access the private bucket.

Step 6 – Creating and Deploying the TensorFlow Inference Service

Let’s go ahead and create an inference service pointing to the model uploaded to the GCS bucket. Notice that we use a node selector to ensure that the service utilizes the GPU for acceleration.

<br /><br />
apiVersion: “serving.kserve.io/v1beta1″<br /><br />
kind: “InferenceService”<br /><br />
metadata:<br /><br />
  name: “dogs-vs-cats”<br /><br />
spec:<br /><br />
  predictor:<br /><br />
    tensorflow:<br /><br />
      storageUri: “gs://tns-kserve/model”<br /><br />
      resources:<br /><br />
        limits:<br /><br />
          nvidia.com/gpu: 1<br /><br />
        requests:<br /><br />
          nvidia.com/gpu: 1

apiVersion: “serving.kserve.io/v1beta1”

kind: “InferenceService”

metadata:

name: “dogs-vs-cats”

spec:

predictor:

tensorflow:

storageUri: “gs://tns-kserve/model”

resources:

limits:

nvidia.com/gpu: 1

requests:

nvidia.com/gpu: 1

Wait for KServe to generate the endpoint for the inference service.

<br /><br />
kubectl get inferenceservice

1	kubectl get inferenceservice

Step 7 – Performing Inference with KServe and TensorFlow

Install the below Python modules in a virtual environment:

<br /><br />
pip install pillow<br /><br />
	h5py<br /><br />
	tensorflow<br /><br />
	requests<br /><br />
	numpy

pip install pillow

h5py

tensorflow

requests

numpy

Execute the client code with sample images of dogs and cats to see the inference in action.

<br /><br />
import argparse<br /><br />
import json</p><br />
<p>import numpy as np<br /><br />
import requests<br /><br />
import tensorflow<br /><br />
import PIL<br /><br />
from tensorflow.keras.preprocessing import image</p><br />
<p>ap = argparse.ArgumentParser()<br /><br />
ap.add_argument(“-i”, “–image”, required=True,<br /><br />
                help=”path of the image”)<br /><br />
ap.add_argument(“-u”, “–uri”, required=True,<br /><br />
                help=”URI of model server”)</p><br />
<p>args = vars(ap.parse_args())</p><br />
<p>image_path = args[‘image’]<br /><br />
uri = args[‘uri’]</p><br />
<p>img = image.img_to_array(image.load_img(image_path, target_size=(128, 128))) / 255.</p><br />
<p>payload = {<br /><br />
    “instances”: [{‘conv2d_input’: img.tolist()}]<br /><br />
}</p><br />
<p>r = requests.post(uri+’/v1/models/dogs-vs-cats:predict’, json=payload)<br /><br />
pred = json.loads(r.content.decode(‘utf-8’))<br /><br />
predict=np.asarray(pred[‘predictions’]).argmax(axis=1)[0]<br /><br />
print( “Dog” if predict==1 else “Cat” )

import argparse

import json

import numpy as np

import requests

import tensorflow

import PIL

from tensorflow.keras.preprocessing import image

ap = argparse.ArgumentParser()

ap.add_argument(“-i”, “–image”, required=True,

help=“path of the image”)

ap.add_argument(“-u”, “–uri”, required=True,

help=“URI of model server”)

args = vars(ap.parse_args())

image_path = args[‘image’]

uri = args[‘uri’]

img = image.img_to_array(image.load_img(image_path, target_size=(128, 128))) / 255.

payload = {

“instances”: [{‘conv2d_input’: img.tolist()}]

}

r = requests.post(uri+‘/v1/models/dogs-vs-cats:predict’, json=payload)

pred = json.loads(r.content.decode(‘utf-8’))

predict=np.asarray(pred[‘predictions’]).argmax(axis=1)[0]

print( “Dog” if predict==1 else “Cat” )

<br /><br />
python infer.py<br /><br />
-u http://dogs-vs-cats.default.34.126.156.171.sslip.io<br /><br />
-i sample1.jpg

python infer.py

–u http://dogs-vs-cats.default.34.126.156.171.sslip.io

–i sample1.jpg

<br /><br />
python infer.py<br /><br />
-u http://dogs-vs-cats.default.34.126.156.171.sslip.io<br /><br />
-i sample2.jpg

python infer.py

–u http://dogs-vs-cats.default.34.126.156.171.sslip.io

–i sample2.jpg

This concludes the end-to-end tutorial on KServe which covered everything you need to explore the popular model server.

Feature Image by Rudy and Peter Skitterians from Pixabay.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.

Let’s create the next big thing together!

Coming together is a beginning. Keeping together is progress. Working together is success.

Let’s talk

Recommended

Tech News

July 18, 2025 by Anh Hoang

Update Serve TensorFlow Models with KServe on Google Kubernetes Engine

Key Summary

Read more about Serve TensorFlow Models with KServe on Google Kubernetes Engine at Wikipedia

Step 1 – Launch a GKE Cluster with T4 GPU Node

Step 2 – Installing Istio

Step 3 – Installing Knative Serving

Step 4 – Installing Certificate Manager

Step 5 – Install KServe Model Server

Step 5 – Configuring Google Cloud Storage Bucket and Uploading a TensorFlow Model

Step 6 – Creating and Deploying the TensorFlow Inference Service

Step 7 – Performing Inference with KServe and TensorFlow

Offshore AI Chatbot Development: Driving Business Innovation

AI‑Driven Automation: 7 Real‑Life Business Success Stories (2025 Update)

AI Automation for Business in 2025: A Step-by-Step Guide

FITNESS APP DEVELOPMENT

ONLINE COURSE APP

EVE HR – WEB DESIGN

AIRGOGO WEBSITE

WALLET APP DEVELOPMENT

Ho Chi Minh City Launches Digital Traffic App 2025

Why Your Business Needs a Mobile App Rather Than a Website

Blog post

9 Practical Tips to Choose a Mobile App Development Company for 2025

Offshore AI Chatbot Development: Driving Business Innovation

Offshore AI Development Center Services: Unlocking Global AI Expertise

AI‑Driven Automation: 7 Real‑Life Business Success Stories (2025 Update)

Locations

Key Summary

Read more about Serve TensorFlow Models with KServe on Google Kubernetes Engine at Wikipedia

Step 1 – Launch a GKE Cluster with T4 GPU Node

Step 2 – Installing Istio

Step 3 – Installing Knative Serving

Step 4 – Installing Certificate Manager

Step 5 – Install KServe Model Server

Step 5 – Configuring Google Cloud Storage Bucket and Uploading a TensorFlow Model

Step 6 – Creating and Deploying the TensorFlow Inference Service

Step 7 – Performing Inference with KServe and TensorFlow

Get a custom Proposal

You need to enter your email to download

Blog post

Locations