Google Dataproc Job

The following table describes parameters for a Google Dataproc job, which performs cloud-based big data processing and machine learning.

Parameter

Description

Connection profile

Defines the name of a connection profile to use to connect to Google Cloud Platform.

Project ID

Defines the project ID for your Google Cloud project.

Account Region

Defines the Google Compute Engine region to create the job.

Dataproc task type

Defines one of the following Dataproc task types to execute:

  • Workflow Template - a reusable workflow configuration that defines a graph of jobs with information on where to run those jobs

  • Job - a single Dataproc job

Workflow Template

(For a Workflow Template task type) Defines the ID of a Workflow Template.

Parameters (JSON Format)

(For a Job task type) Defines input parameters to be passed on to job execution, in JSON format.

You retrieve this JSON content from the GCP Dataproc UI, using the EQUIVALENT REST option in job settings.

Verification Poll Interval (in seconds)

(Optional) Defines the number of seconds to wait before checking the status of the job.

Default: 20

Tolerance

Defines the number of call retries during the status check phase.

Default: 2 times