Loading Data Into Sas Visual Analytics Sas Visual Analytics New File Upload Window

In this commodity, we will learn how we can load data into Azure SQL Database from Azure Databricks using Scala and Python notebooks.

With unprecedented volumes of data being generated, captured, and shared by organizations, fast processing of this data to gain meaningful insights has get a dominant concern for businesses. One of the popular frameworks that offer fast processing and assay of big data workloads is Apache Spark.

Azure Databricks is the implementation of Apache Spark analytics on Microsoft Azure, and information technology integrates well with several Azure services like Azure Hulk Storage, Azure Synapse Analytics, and Azure SQL Database, etc. Spinning up clusters in fully managed Apache Spark environment with benefits of Azure Cloud platform could have never been easier. In case you are new to Databricks, you tin benefit and sympathize its basics from this tutorial here.

Data processing is one vital footstep in the overall data life cycle. In one case this data is processed with the assistance of fast processing clusters, information technology needs to be stored in storage repositories for information technology to exist easily accessed and analyzed for a variety of future purposes like reporting.

In this article, we will load the processed data into the SQL Database on Azure from Azure Databricks. Databricks in Azure supports APIs for several languages similar Scala, Python, R, and SQL. As Apache Spark is written in Scala, this language selection for programming is the fastest one to utilize.

Let's go alee and demonstrate the data load into SQL Database using both Scala and Python notebooks from Databricks on Azure.

Preparations before demo

Before nosotros outset with our exercise, we will demand to have the post-obit prerequisites:

You need to have an active Azure Subscription. If you don't take it, you lot tin create information technology here
Azure Databricks – You lot demand to gear up both Databricks service and cluster in Azure, you can go over the steps in this article, A beginner's guide to Azure Databricks to create these for yous. As shown in this article, nosotros take created a Databricks service named "azdatabricks" and Databricks cluster named "azdatabrickscluster"
Azure SQL Database – Creating a SQL Database on Azure is a direct-frontward process. I have put out screenshots below to throw a quick idea on how to create a SQL Database on Azure

On the Azure portal, yous tin can either straight click on Create a resource push button or SQL databases on the left vertical menu bar to land on the Create SQL Database screen.

Create a SQL Database via Azure portal.

Provide details like Database name, its configuration, and create or select the Server proper name. Click on the Review + create button to create this SQL database on Azure.

Create an Azure SQL Database.

Cheque out this official documentation by Microsoft, Create an Azure SQL Database, where the process to create a SQL database is described in swell detail.

Uploading a CSV file on Azure Databricks Cluster

We will be loading a CSV file (semi-structured data) in the Azure SQL Database from Databricks. For the aforementioned reason, let'due south quickly upload a CSV file on the Databricks portal. You can download it from here. Click on the Data icon on the left vertical carte bar and select the Add Data push.

Uploading data on Azure Databricks.

Browse and choose the file that you want to upload on Azure Databricks.

Uploading data on Databricks portal in Azure.

Once uploaded, you lot can meet the file "1000 Sales Records.csv" being uploaded on the Azure Databricks service. Take a note of the path name of the file: /FileStore/tables/1000_Sales_Records-d540d.csv. We volition employ this path in notebooks to read data.

CSV file uploaded in Databricks portal on Azure.

Load data into Azure SQL Database from Azure Databricks using Scala

Striking on the Create push button and select Notebook on the Workspace icon to create a Notebook.

Create a Notebook on Azure Databricks.

Type in a Name for the notebook and select Scala as the language. The Cluster name is cocky-populated as there was simply one cluster created, in example yous have more than clusters, y'all tin can e'er select from the drop-downwardly list of your clusters. Finally, click Create to create a Scala notebook.

Create a Scala notebook on Databricks.

We will outset by typing in the lawmaking, as shown in the following screenshot. Let's break this chunk of code in small parts and effort to sympathise.

In the beneath code, we volition first create the JDBC URL, which contains information like SQL Server, SQL Database proper noun on Azure, along with other details like Port number, user, and password.

val url = "jdbc:sqlserver://azsqlshackserver.database.windows.internet:1433;database=azsqlshackdb;user=gauri;password=*******"

Adjacent, nosotros will create a Properties() to link the parameters.

import java . util . Properties

val myproperties = new Backdrop ( )

myproperties . put ( "user" , "gauri" )

myproperties . put ( "password" , "******" )

The following lawmaking helps to check the connectivity to the SQL Server Database.

val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"

myproperties . setProperty ( "Driver" , driverClass )

Lastly, we will read the CSV file into mydf data frame. With header = true option, the columns in the start row in the CSV file will be treated as the data frame's columns names. Using inferSchema = true, we are telling Spark to automatically infer the schema of each column.

val mydf = spark . read . format ( "csv" )

. selection ( "header" , "true" )

. option ( "inferSchema" , "true" )

. load ( "/FileStore/tables/1000_Sales_Records-d540d.csv" )

We volition utilise the display() function to prove records of the mydf data frame.

Transforming the data

Now, allow'south effort to exercise some quick data munging on the dataset, we will transform the column SalesChannel -> SalesPlatform using withColumnRenamed() office.

val transformedmydf = mydf . withColumnRenamed ( "SalesChannel" , "SalesPlatform" )

display ( transformedmydf )

Before we load the transformed data into the Azure SQL Database, let'southward quickly take a peek at the database on the Azure portal. For this go to the portal, and select the SQL database, click on the Query editor (preview),

And provide your Login and Password to query the SQL database on Azure. Click OK.

The below screenshot shows that currently, there are no tables, no information in this database.

Loading the processed information into Azure SQL Database using Scala

On the Azure Databricks portal, execute the below lawmaking. This will load the CSV file into a table named SalesTotalProfit in the SQL Database on Azure.

Transformedmydf . write . jdbc ( url , "SalesTotalProfit" , myproperties )

Head back to the Azure portal, refresh the window and execute the below query to select records from the SalesTotalProfit table.

SELECT * FROM [ dbo ] . [ SalesTotalProfit ]

The data is loaded into the table, SalesTotalProfit in the database, azsqlshackdb on Azure. And yous can perform any operations on the data, as you would do in any regular database.

UPDATE [ dbo ] . [ SalesTotalProfit ]

Prepare ItemType = 'Clothing'

WHERE ItemType = 'Dress'

SELECT * FROM [ dbo ] . [ SalesTotalProfit ]

The following code reads information from the SalesTotalProfit table in the Databricks. Here, we are processing and accumulation the data per Region and displaying the results.

val azsqldbtable = spark . read . jdbc ( url , "SalesTotalProfit" , myproperties )

display ( azsqldbtable . select ( "Region" , "TotalProfit" ) . groupBy ( "Region" ) . avg ( "TotalProfit" )

Load data into Azure SQL Database from Azure Databricks using Python

Let's create a new notebook for Python sit-in. But select Python as the language option when y'all are creating this notebook. We will name this book as loadintoazsqldb.

The following lawmaking sets various parameters like Server name, database proper name, user, and countersign.

jdbcHostname = "azsqlshackserver.database.windows.net"

jdbcPort = "1433"

jdbcDatabase = "azsqlshackdb"

properties = {

"user" : "gauri" ,

"countersign" : "******" }

The beneath lawmaking creates a JDBC URL. We will use sqlContext() to read the csv file and mydf information frame is created as shown in the screenshot below.

url = "jdbc:sqlserver://{0}:{one};database={2}" . format ( jdbcHostname , jdbcPort , jdbcDatabase )

mydf = sqlContext . read . csv ( "/FileStore/tables/1000_Sales_Records-d540d.csv" , header = Truthful )

We will import the pandas library and using the DataFrameWriter function; we will load CSV data into a new dataframe named myfinaldf. And finally, write this data frame into the table TotalProfit for the given backdrop. In case, this tabular array exists, we can overwrite information technology using the style every bit overwrite.

from pyspark . sql import *

import pandas every bit pd

myfinaldf = DataFrameWriter ( mydf )

myfinaldf . jdbc ( url = url , table = "TotalProfit" , mode = "overwrite" , backdrop = properties )

Go to Azure Portal, navigate to the SQL database, and open Query Editor. Open the Tables folder to see the CSV data successfully loaded into the table TotalProfit in the Azure SQL database, azsqlshackdb.

Conclusion

Azure Databricks, a fast and collaborative Apache Spark-based analytics service, integrates seamlessly with a number of Azure Services, including Azure SQL Database. In this article, we demonstrated step-by-footstep processes to populate SQL Database from Databricks using both Scala and Python notebooks.

Author
Contempo Posts

Gauri is a SQL Server Professional and has half-dozen+ years experience of working with global multinational consulting and technology organizations. She is very passionate about working on SQL Server topics like Azure SQL Database, SQL Server Reporting Services, R, Python, Power BI, Database engine, etc. She has years of experience in technical documentation and is fond of technology authoring.

She has a deep feel in designing data and analytics solutions and ensuring its stability, reliability, and performance. She is also certified in SQL Server and take passed certifications like 70-463: Implementing Data Warehouses with Microsoft SQL Server.

View all posts past Gauri Mahajan

jonesthor1949.blogspot.com

Source: https://www.sqlshack.com/load-data-into-azure-sql-database-from-azure-databricks/

Loading Data Into Sas Visual Analytics Sas Visual Analytics New File Upload Window

Preparations before demo

Uploading a CSV file on Azure Databricks Cluster

Load data into Azure SQL Database from Azure Databricks using Scala

Transforming the data

Loading the processed information into Azure SQL Database using Scala

Load data into Azure SQL Database from Azure Databricks using Python

Conclusion

0 Response to "Loading Data Into Sas Visual Analytics Sas Visual Analytics New File Upload Window"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel