Google Dataflow Job

The following table describes parameters for a Google Dataflow job, which performs cloud-based data processing for batch and real-time data streaming applications.

Parameter

Description

Connection profile

Defines the name of a connection profile to use to connect to Google Cloud Platform.

Project ID

Defines the project ID for your Google Cloud project.

Location

Defines the Google Compute Engine region to create the job.

Template Type

Defines one of the following types of Google Dataflow templates:

  • Classic Template - Developers run the pipeline and create a template. The Apache Beam SDK stages files in Cloud Storage, creates a template file (similar to job request), and saves the template file in Cloud Storage.

  • Flex Template - Developers package the pipeline into a Docker image and then use the Google Cloud CLI to build and save the Flex Template spec file in Cloud Storage.

Template Location (gs://)

Defines the path for temporary files. This must be a valid Google Cloud Storage URL that begins with gs://.

The pipeline option tempLocation is used as the default value, if it has been set.

Parameters (JSON Format)

Defines input parameters to be passed on to job execution, in JSON format (name:value pairs).

This JSON must include the jobname and parameters elements, as in the following example:

Copy


    "jobName": "wordcount", 

    "parameters": { 

        "inputFile": "gs://dataflow-samples/shakespeare/kinglear.txt", 

        "output": "gs://controlmbucket/counts" 

    } 

Verification Poll Interval (in seconds)

(Optional) Defines the number of seconds to wait before checking the status of the job.

Default: 10

Log Level

Determines one of the following levels of details to retrieve from the GCP logs in the case of job failure:

  • TRACE

  • DEBUG

  • INFO

  • WARN

  • ERROR