Redwood Documentation

Product Documentation

 

›Connectors

Separate ComponentsGeneric Components

Catalog

  • Catalog

Connectors

  • Connections
  • Azure Subscriptions
  • Azure DataFactory
  • JSCAPE Component
  • IBM z/OS
  • Box Connector
  • Boomi
  • Databricks
  • Sharepoint
  • Informatica
  • Cognos
  • ServiceNow
  • ChatGPT

Inbound REST

  • REST Services
  • API Key

Excel

  • Generic Excel Components

Platform Agent

  • Core Platform Functions

Oracle Data Services

  • Oracle Data Integrator

SAP Data Services

  • BTP Cloud Integration
  • SAP CI DS
  • SAP IBP
  • BusinessObjects
  • SAP SNC

SLA

  • SLA Component Installation
  • SLA Rules
  • SLA Dashboard
  • Advanced Configuration

On-Premises SSO

  • SSO Module

Privileges Required

  • Azure Subscriptions
  • Box Connections
  • Boomi
  • Databricks
  • Catalog
  • ChatGPT
  • Connections
  • Azure Data Factory
  • Sharepoint
  • Informatica
  • REST Services
  • JSCAPE Connector
  • API Keys
  • Excel
  • ODI Connections
  • Queues
  • CloudIntegration Connections
  • HCI DS Connections
  • IBP Connections
  • SLA
  • ServiceNow
  • z/OS Connections
← BoomiSharepoint →

Databricks Connector

The Databricks component allows you to list, import, and automate Databricks jobs.

Prerequisites

  • Version 9.2.9 or later
  • Component Connection Management version 1.0.0.3
  • Privileges Required to use Connections
  • Privileges Required to use Databricks

Contents of the Component

Object TypeNameDescription
ApplicationGLOBAL.Redwood.REDWOOD.DatabricksIntegration connector with the Databricks system
ConstraintDefinitionREDWOOD.Redwood_DatabricksConnectionConstraintConstraint for Databricks Connection fields
ExtensionPointREDWOOD.Redwood_DatabricksConnectionDatabricks Connector
Process DefinitionREDWOOD.Redwood_Databricks_ImportJobImport a job from Databricks
Process DefinitionREDWOOD.Redwood_Databricks_RunJobRun a job in Databricks
Process DefinitionREDWOOD.Redwood_Databricks_RunJob_TemplateTemplate definition to run a job in Databricks
Process DefinitionREDWOOD.Redwood_Databricks_ShowJobsList all existing jobs in Databricks
Process Definition TypeREDWOOD.Redwood_DatabricksDatabricks Connector
LibraryREDWOOD.Redwood_DatabricksLibrary for Databricks connector

Process Definitions

Redwood_Databricks_ImportJob

Import a job from Databricks. Imports one or more Databricks jobs as RunMyJobs Process Definitions. Specify a Name Filter to control what processes are imported, and Generation Settings to control the attributes of the imported definitions.

Parameters

TabNameDescriptionDocumentationData TypeDirectionDefault ExpressionValues
ParametersconnectionConnectionThe Connection object that defines the connection to the Databricks application.StringIn

ParametersfilterJob Name FilterThis filter can be used to limit the amount of jobs returned to those which name matches the filter. Wildcards * and ? are allowed.StringIn

ParametersoverwriteOverwrite Existing DefinitionWhen set to Yes, if a definition already exists with the same name as the name generated for the imported object, it will be overwritten with the new import.
When set to No, the import for that template will be skipped if a definition with the same name already exists.
StringInNY,N
Generation SettingstargetPartitionPartitionThe Partition to create the new definitions in.StringIn

Generation SettingstargetApplicationApplicationThe Application to create the new definitions in.StringIn

Generation SettingstargetQueueDefault QueueThe default Queue to assign to the generated definitions.StringIn

Generation SettingstargetPrefixDefinition Name PrefixThe prefix to add onto the name of the imported Databricks Job to create the definition name.StringInCUS_DBCKS_

Redwood_Databricks_RunJob

Runs a Databricks job and monitors it until completion. The RunMyJobs Process will remain in a Running state until the Databricks job completes. If the Databricks Job succeeds, the RunMyJobs process will complete successfully. If the Databricks Job fails, the RunMyJobs process will complete in Error, and any available error information is written to the stdout.log file. Parameters are available on the definition to pass input parameters for the different types of Databricks tasks. For example, adding a value to the Python Parameters parameter will make that parameter available to all Python tasks in the Databricks Job. If the job does not require parameters for a certain task type, leave that parameter empty. See the parameters table below for more information.

Parameters

NameDescriptionDocumentationData TypeDirectionDefault ExpressionValues
connectionConnectionThe Connection object that defines the connection to the Databricks application.StringIn

jobIdJob Id to runThis is the Job Id in Databricks to executeStringIn

sparkJarParametersSpark Jar ParametersAn array of Spark Jar Parameters to be used on the Databricks JobStringIn

sparkSubmitParametersSpark Submit ParametersAn array of Spark Submit Parameters to be used on the Databricks JobStringIn

notebookParametersNotebook ParametersAn array key=value pairs of Notebook Parameters to be used on the Databricks JobStringIn

pythonParametersPython ParametersAn array of Python Parameters to be used on the Databricks JobStringIn

pythonNamedParametersPython Named ParametersAn array key=value pairs of Python Named Parameters to be used on the Databricks JobStringIn

sqlParametersSQL ParametersAn array key=value pairs of SQL Parameters to be used on the Databricks JobStringIn

dbtParametersDBT ParametersAn array of DBT Parameters to be used on the Databricks JobStringIn

pipelineFullRefreshPipeline Full RefreshShould a full refresh be performed on the Databricks Pipeline JobStringIn

Y=Yes, N=No
runIdDatabricks Run IdThe Run Id of the executed Job on the Databricks sideStringOut

Redwood_Databricks_RunJob_Template

AThis template definition is provided to facilitate creating definitions that run specificDatabricksjobs. It's functionality and parameters are the same as theRedwood_Databricks_RunJobdefinition. To create a definition, Choose New (from template) from the context-menu of Redwood_Databricks_RunJob_Template.

note

To provide a default value for the Connection in the Connection parameter of the template, you must use the full Business Key of the Connection: EXTConnection:<Partition>.<ConnectionName>. Example: EXTConnection:GLOBAL.MyDatabricksConnection

Parameters

NameDescriptionDocumentationData TypeDirectionDefault ExpressionValues
connectionConnectionThe Connection object that defines the connection to the Databricks application.StringIn

jobIdJob Id to runThis is the Job Id in Databricks to executeStringIn

sparkJarParametersSpark Jar ParametersAn array of Spark Jar Parameters to be used on the Databricks JobStringIn

sparkSubmitParametersSpark Submit ParametersAn array of Spark Submit Parameters to be used on the Databricks JobStringIn

notebookParametersNotebook ParametersAn array key=value pairs of Notebook Parameters to be used on the Databricks JobStringIn

pythonParametersPython ParametersAn array of Python Parameters to be used on the Databricks JobStringIn

pythonNamedParametersPython Named ParametersAn array key=value pairs of Python Named Parameters to be used on the Databricks JobStringIn

sqlParametersSQL ParametersAn array key=value pairs of SQL Parameters to be used on the Databricks JobStringIn

dbtParametersDBT ParametersAn array of DBT Parameters to be used on the Databricks JobStringIn

pipelineFullRefreshPipeline Full RefreshShould a full refresh be performed on the Databricks Pipeline JobStringIn

Y=Yes, N=No
runIdDatabricks Run IdThe Run Id of the executed Job on the Databricks sideStringOut

Redwood_Databricks_ShowJobs

List all existing jobs in Databricks. Fetches information about the available Databricks Jobs. Job properties for returned jobs are written to the stdout.log file, the file named listing.rtx, as well as the Out parameter Job Listing.

Parameters

NameDescriptionDocumentationData TypeDirectionDefault ExpressionValues
connectionConnectionThe Connection object that defines the connection to the Databricks application.StringIn

filterJob Name FilterThis filter can be used to limit the amount of jobs returned to those which name matches the filter. Wildcards * and ? are allowed.StringIn

listingJob listingThe listing of all jobs available that match the input filter (or any if no input filter was provided)TableOut

Procedure

Create a Connection To Databricks

  1. Navigate to Custom > Connections and choose .
  2. Choose Databricks Connection under Select a Connection Type.
  3. Choose Next or Basic Properties, this is a screen for all components, you create a queue and process server for your Databricks connection, all required settings will be set automatically.
  4. Choose Next or Security, this is a screen for all components, choose to specify which roles can access the connection information.
  5. Choose Next Datbricks Connection Properties, this is specific for Infmatica-based components, for Databricks, you choose between Basic and Personal Access Token authentication:
    1. For Basic authentication, you specify URL, Username, and your Password.
    2. For API Key authentication, you specify URL, Username, and your Access Token.
  6. Navigate to Environment > Process Server, locate your Databricks process server and start it, ensure it reaches status Running.

Listing Databricks Jobs

  1. Navigate to Definitions > Processes.
  2. Choose Submit from the context-menu of Redwood_Databricks_ShowJobs.
  3. Select the connection in the Connection field, specify an optional name filter in the Job Name Filter parameter, and choose Submit to list all available jobs.

Locating Connection Settings

  1. Navigate to Custom > Connections.
  2. The first line is used for filtering connections; you filter on Connection Type, Name, or Description by simply starting to type.

See Also

  • Catalog
  • Privileges Required to use Informatica
← BoomiSharepoint →
  • Prerequisites
  • Contents of the Component
  • Process Definitions
  • Redwood_Databricks_ImportJob
    • Parameters
  • Redwood_Databricks_RunJob
    • Parameters
  • Redwood_Databricks_RunJob_Template
    • Parameters
  • Redwood_Databricks_ShowJobs
    • Parameters
  • Procedure
    • Create a Connection To Databricks
    • Listing Databricks Jobs
    • Locating Connection Settings
  • See Also
Docs
Getting StartedInstallationFinance InstallationConcepts
TroubleshootingArchiving
Learn and Connect
Support Portal
BlogEventsResources
ISO/ IEC 27001 Information Security Management
Automate to be human

2023 All Rights Reserved |

Terms of Service | Policies | Cookies | Glossary | Third-party Software | Contact | Copyright | Impressum |