Google Drive Ingestion

Google Drive Ingestion lets you pull files stored in Google Drive directly into your AI Data Foundry project.

After connecting a source, use Import now to ingest files immediately, or set up an automatic schedule to periodically collect newly added files.


Prerequisites

Google Drive sources authenticate with a Google Cloud service account. Create a service account and key as below, then share the target folder with that service account.

1. Enable the Google Drive API and create a service account

  1. In the Google Cloud Console, select or create a project.
  2. Under APIs & Services → Library, search for Google Drive API and click Enable.
  3. Under IAM & Admin → Service Accounts → Create service account, create a new service account. (No IAM role is required — access is granted through Drive sharing instead.)
  4. Open the service account and choose Keys → Add key → Create new key → JSON to download the key file.

The downloaded JSON must contain at least client_email and private_key. This entire file is the credential you will use to connect.

2. Share the target folder with the service account

A service account has no files in its own Drive, so you must share the folder that contains the files you want to ingest with the service account's email.

  1. In Google Drive, right-click the folder → Share.
  2. Add the service account email (the client_email value from the JSON, in the form ...@....iam.gserviceaccount.com) with Viewer access or higher.
  3. (Optional) To ingest a specific folder only, note its Folder ID — the string after folders/ in the folder's URL.
    • Example: https://drive.google.com/drive/folders/1AbC...XyZ → the Folder ID is 1AbC...XyZ
Item Description
Service Account JSON The entire contents of the service account key file (JSON) issued by Google Cloud
Folder ID (Optional) The Drive folder ID to scope ingestion to. Leave empty to use the service account's My Drive root

The connection uses read-only access only (https://www.googleapis.com/auth/drive.readonly). The ingester never modifies or deletes files.


Step 1: Connect a Google Drive Source

  1. Click the Ingestion tab in the left sidebar.

    Ingestion tab selection
  2. Click Connect a new source.

    Connect a new source button
  3. Select Google Drive as the source type (it is selected by default) and fill in the fields.

    Google Drive source connection dialog
Field Required Description
Name Yes A friendly name for this source (e.g., "My Drive - quarterly reports")
Service Account JSON Yes Paste the entire service account key (JSON) you downloaded
Folder ID (optional) No Enter a Folder ID to ingest a specific folder. Leave empty to use the service account's My Drive root
  1. Click Test and save to automatically test that the specified folder (or root) is accessible.

  2. If the test passes, the source is registered and a source card appears.

    Source registered

Credentials are encrypted before storage and are never included in API responses.


Step 2: Ingest Files

Once the source is connected, you can pull files from the Drive folder into your project.

Click Import now on the source card to scan the target folder and automatically ingest eligible files.

How Ingestion Works

  • First run: Scans and ingests the files directly inside the target folder (or My Drive root).
  • Subsequent runs: Only files modified since the last ingestion are collected (incremental ingestion based on modifiedTime).
  • Deduplication: Files already ingested (same source + same Drive file) are not imported again.
  • Per-run limit: Scheduled (automatic) runs process up to 200 files per run; manual (Import now) runs process up to 50 files per run. Any remaining files are picked up in the next run.

Subfolders are not traversed automatically. Ingestion targets only the files directly inside the specified folder (or My Drive root if the Folder ID is empty). To ingest files in a subfolder, add a separate source using that subfolder's Folder ID.

Progress is shown in real time in the Recent ingestion jobs section at the bottom of the page.

Google Document Format Conversion

Google-native files are automatically converted to Office formats when downloaded.

Source (Google) Converted to
Google Docs .docx
Google Sheets .xlsx
Google Slides .pptx

All other files (PDF, images, etc.) are imported as-is. Other Google-native types such as Drawings and Forms have no conversion target and may not be ingested.

Supported Formats and Size Limits

  • Formats: All file formats supported by the project (Supported Formats)
  • Size: Up to 100 MB per file

Step 3: Set Up Automatic Schedule (Optional)

To automatically ingest new files on a recurring basis, enable a schedule.

  1. Click the Run schedule dropdown on the source card.
  2. Choose your preferred interval:
Interval Description
Manual only No automatic ingestion (default)
Every 6 hours Automatically ingest new files every 6 hours
Every 12 hours Automatically ingest new files every 12 hours
Every day Automatically ingest new files every 24 hours
Every week Automatically ingest new files every 7 days
  1. Select an interval, turn on the Enabled checkbox, and click Save to apply the schedule.

Once the schedule is active, the source card displays the Last run time and Next run estimate.

Run Now

Click Import now at any time to trigger an immediate ingestion without waiting for the next scheduled cycle.


Monitoring Ingestion Jobs

The Recent ingestion jobs table at the bottom of the Ingestion page shows the history of all ingestion jobs.

Column Description
Source Which source the files were ingested from
Mode Manual or Scheduled
Status Pending / Running / Completed / Partial / Failed
Success / Fail Number of successfully ingested files and failed files
Started / Completed Job start and completion timestamps

Click a job row to view the detailed list of ingested files.

Status Reference

Status Meaning
Pending The job is queued and waiting to be processed
Running Files are being downloaded and imported into the project
Completed All files were successfully ingested
Partial Some files succeeded, some failed (unsupported format, size exceeded, etc.)
Failed The entire job failed (connection error, etc.)

FAQ

Connection to the source fails

The connection test verifies access to the specified folder (or root). If it fails, check the following.

  • Make sure the target folder is shared with the service account email (client_email). (Most common cause.)
  • Make sure the Google Drive API is enabled in your Google Cloud project.
  • If you entered a Folder ID, verify it is correct (the string after folders/ in the folder URL).
  • Make sure the pasted service account JSON is valid JSON containing client_email and private_key.

Files in subfolders are not ingested

Ingestion targets only the files directly inside the specified folder; subfolders are not traversed automatically. To ingest files in a subfolder, share that folder with the service account and add a source using its Folder ID.

How are Google Docs / Sheets ingested?

Google-native files are automatically converted to .docx / .xlsx / .pptx respectively. The original Google file is left untouched; only the converted copy is imported into the project.

Files are missing from ingestion

  • After the first run, ingestion only fetches files modified since the last run. Previously existing but unmodified files are not included.
  • There is a per-run limit on the number of files processed: up to 200 for scheduled (automatic) runs and up to 50 for manual (Import now) runs. Excess files will be picked up in the next run.
  • Files already ingested (same source + same Drive file) are skipped by deduplication.

Will deleting a source also delete already ingested files?

No. Deleting a source only removes the connection configuration and schedule settings. Files already imported into your project remain untouched.