Google Cloud Platform-Vertex AI AutoML
Earlier when one had to train an AI model, they would require days to gather data, annotate it, find the correct resources for training and then train their model for hours and hours and also spend a lot in the process. GCP has come out with ways to make the entire process of training a model much simpler for a user.
In this blog I will talk about how you can train your own Object Detection and Image Classification model on Google Cloud Vertex AI AutoML
What is Vertex AI?
Vertex AI is a machine learning (ML) platform that can be used to train and deploy ML models and AI applications. It has a lot of features to explore under it. In this article, we will talk about options for model training.
Under vertex AI, there are 2 options for model training:
1. AutoML:
Allows you to train tabular, image, text, or video data without writing code or preparing data splits.
2. Custom Training:
Gives you entire control over the training process, allowing you to write your own training code, use your favourite ML framework, and select different hyperparameter tweaking options.
Using AutoML: To use AutoML, the first thing you will have to do is get your dataset ready. Data can be of the type Image, Tabular, Text or Video. I have decided to use image data.
- Store your data (CSV + Image files) in a google cloud storage bucket.
- To upload/download a folder to/from GCS use the following:
gcloud storage cp <file path> <bucket path> — recursive
- Then go on to create a dataset as described below.
Image Classification:
1. Create Image classification dataset
- Go to datasets under Vertex AI. Click on ‘create dataset’
- Name your dataset and choose ‘Image Classification (Single Label)/(Multi Label)”
- Set your region and click ‘Create’.
- Select import files from Cloud Storage and specify the Cloud Storage URI of the CSV file with the image location and label data. Make sure the CSV and the images are in the same bucket.
- Click on continue to import the dataset. This might take a few minutes.
CSV format: [ML_USE], GCS_FILE_PATH, [LABEL]
List of columns
- ML_USE (Optional) — For data split purposes when training a model. Use TRAINING, TEST, or VALIDATION.
- GCS_FILE_PATH — This field contains the Cloud Storage URI for the image. Cloud Storage URIs are case-sensitive.
- LABEL (Optional) — Labels must start with a letter and only contain letters, numbers, and underscores.
Example CSV — image_classification_single_label.csv:
test,gs://bucket/filename1.jpeg,daisy
training,gs://bucket/filename2.gif,dandelion
gs://bucket/filename3.png
…
- Click on ‘train new model’ if the dataset has been imported correctly.
- Enter details and put in node hours. Enable Early stopping.
- Once the model has been trained, go to evaluate to see the results.
Object Detection:
1. Create Object Detection Dataset
- Go to datasets under Vertex AI. Click on ‘create dataset’
- Name your dataset and choose ‘Image Object Detection’
- Make sure the region is set to ‘us-central1 (Iowa)’ and click ‘Create’.
- Select import files from Cloud Storage and specify the Cloud Storage URI of the CSV file with the image location and label data. Make sure the CSV and the images are in the same bucket.
- Click on continue to import the dataset. This might take a few minutes.
CSV format: [ML_USE], GCS_FILE_PATH, [LABEL], [BOUNDING_BOX]*
List of columns
- ML_USE (Optional). For data split purposes when training a model. Use TRAINING, TEST, or VALIDATION.
- GCS_FILE_PATH. This field contains the Cloud Storage URI for the image. Cloud Storage URIs are case-sensitive.
- LABEL. Labels must start with a letter and only contain letters, numbers, and underscores.
- BOUNDING_BOX. A bounding box for an object in the image. Specifying a bounding box involves more than one column.
· Each vertex is specified by x, y coordinate values. Coordinates are normalized float values [0,1]; 0.0 is X_MIN or Y_MIN, 1.0 is X_MAX or Y_MAX.
For example, a bounding box for the entire image is expressed as (0.0,0.0,,,1.0,1.0,,), or (0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0).
One of two methods can be used to specify the bounding box for an item:
1. Two points on the rectangle that are diagonally opposite vertices (two sets of x,y coordinates):
A. X_MIN,Y_MIN
C. X_MAX,Y_MAX
as shown in this example:
A,,C,
X_MIN,Y_MIN,,,X_MAX,Y_MAX,,
2. Each of the four vertices is depicted as:
X_MIN,Y_MIN,X_MAX,Y_MIN, X_MAX,Y_MAX,X_MIN,Y_MAX,
Vertex AI indicates vertices that do form a rectangle parallel to the edges of the image if the four provided vertices don’t.
Example CSV — object_detection.csv:
test,gs://bucket/filename1.jpeg,Tomato,0.3,0.3,,,0.7,0.6,,
training,gs://bucket/filename2.gif,Tomato,0.8,0.2,,,1.0,0.4,,
gs://bucket/filename2.gif
…
- Click on ‘train new model’ if the dataset has been imported correctly.
- Enter details and put in node hours. Enable Early stopping. Select the checkbox for incremental training.
- Once the model has been trained, go to evaluate to see the results.
2. To perform incremental training:
- After you are done training your model, you can treat this existing model as the base model and train more models using this base model by modifying the dataset.
- In order to do so, import new data in the same dataset. Click on train and select incremental training and then select the base model you want to choose. Go ahead and train the model like before after this step.