Google’s Vertex AI is a unified machine learning and deep learning platform that supports AutoML models and custom models. In this tutorial, we will train an image classification model to detect face masks with Vertex AI AutoML. For an introduction to Vertex AI, read this article I published last week at InApps.
To complete this tutorial, you need an active Google Cloud subscription and Google Cloud SDK installed on your workstation.
There are three steps involved in training this model: dataset creation, training, and inference.
Dataset creation involves uploading the images and labeling them. Since we are using AutoML, training needs minimal intervention. We don’t need to write code or perform steps like hyperparameter tuning. When the training is done, we can download the model for deployment in edge devices or host it for performing inference.
In the first part of this tutorial, we will focus on creating the dataset. For this tutorial, we will use the raw dataset of faces with mask and without mask created by Prajna Bhandary.
She used image augmentation techniques to generate 600+ images for each class.
While this is not the most comprehensive dataset, it makes a good choice for AutoML which can train models with a lesser number of images.
We will upload these images to Google Cloud Storage bucket with two folders — mask
and no-mask
. A CSV file with the path of each image and the label will be uploaded to the same bucket which becomes the input for Vertex AI.
Let’s create the Google Cloud Storage bucket.
1 2 | BUCKET=j–mask–nomask REGION=EUROPE–WEST4 |
Feel free to change the values to reflect your bucket name and the region. At the time of launch, Vertex AI AutoML is available only in US-CENTRAL1 (Iowa) and EUROPE-WEST4 (Netherlands) regions.
1 | gsutil mb –l $REGION –c STANDARD gs://$BUCKET |
We will now start uploading the images to the above bucket.
Clone the GitHub repository on your local machine.
1 | git clone https://github.com/prajnasb/observations.git |
Navigate to the data
directory and run the following commands:
1 | gsutil cp –r with_mask gs://$BUCKET |
1 | gsutil cp –r without_mask gs://$BUCKET |
To upload images simultaneously from both the directories, run the commands in two different terminal windows.
Check the Google Cloud Console and browse the folders.
Once the images are uploaded, we need to generate a CSV file with the path and label of each image.
We will run a simple BASH script for this task.
1 2 3 4 | for filename in with_mask/*.jpg; do [ –e “$filename” ] || continue echo “gs://$BUCKET/$filename,mask” >> mask–ds.csv done |
This populates the file, mask-ds.csv
with entries that looks like this:
1 2 3 4 | gs://j-mask-nomask/with_mask/0-with-mask.jpg,mask gs://j-mask-nomask/with_mask/1-with-mask.jpg,mask gs://j-mask-nomask/with_mask/10-with-mask.jpg,mask gs://j-mask-nomask/with_mask/100-with-mask.jpg,mask |
Let’s repeat this for the second folder to generate the path and label for no-mask.
1 2 3 4 | for filename in without_mask/*.jpg; do [ –e “$filename” ] || continue echo “gs://$BUCKET/$filename,no-mask” >> mask–ds.csv done |
This will append lines to the CSV file with the path of images with no mask.
1 2 3 4 5 | gs://j-mask-nomask/without_mask/0.jpg,no-mask gs://j-mask-nomask/without_mask/1.jpg,no-mask gs://j-mask-nomask/without_mask/10.jpg,no-mask gs://j-mask-nomask/without_mask/100.jpg,no-mask gs://j-mask-nomask/without_mask/101.jpg,no-mask |
Finally, we need to upload the CSV file to the bucket.
1 | gsutil cp mask–ds.csv gs://$BUCKET |
The CSV file becomes the critical input to Vertex AI AutoML to create the final dataset.
Running the command, gsutil ls gs://$BUCKET
confirms that the CSV file is successfully uploaded to Google Cloud Storage bucket.
With the data uploaded to cloud storage, let’s turn that into a Vertex AI dataset.
Access the Vertex AI Dashboard in the Google Cloud Console and enable the API. Choose the region and click on create dataset:
Give the dataset a name, choose image classification with a single label, and click on create:
In the next section, choose select import files from Cloud Storage:
Browse the Cloud Storage bucket and select the CSV file uploaded earlier, and click on continue:
The import process takes a few minutes. When it completes, you are taken to the next page that shows all of the images identified from the dataset, both labeled and unlabeled images:
You may see some warnings and errors during the import process due to duplicate images found by Vertex AI. They can be safely ignored.
We are now ready to kick off the training. Stay tuned for the next part of the tutorial for a walkthrough of the training and inference process.