Loading Data Into Sas Visual Analytics Sas Visual Analytics New File Upload Window
In this commodity, we will learn how we can load data into Azure SQL Database from Azure Databricks using Scala and Python notebooks.
With unprecedented volumes of data being generated, captured, and shared by organizations, fast processing of this data to gain meaningful insights has get a dominant concern for businesses. One of the popular frameworks that offer fast processing and assay of big data workloads is Apache Spark.
Azure Databricks is the implementation of Apache Spark analytics on Microsoft Azure, and information technology integrates well with several Azure services like Azure Hulk Storage, Azure Synapse Analytics, and Azure SQL Database, etc. Spinning up clusters in fully managed Apache Spark environment with benefits of Azure Cloud platform could have never been easier. In case you are new to Databricks, you tin benefit and sympathize its basics from this tutorial here.
Data processing is one vital footstep in the overall data life cycle. In one case this data is processed with the assistance of fast processing clusters, information technology needs to be stored in storage repositories for information technology to exist easily accessed and analyzed for a variety of future purposes like reporting.
In this article, we will load the processed data into the SQL Database on Azure from Azure Databricks. Databricks in Azure supports APIs for several languages similar Scala, Python, R, and SQL. As Apache Spark is written in Scala, this language selection for programming is the fastest one to utilize.
Let's go alee and demonstrate the data load into SQL Database using both Scala and Python notebooks from Databricks on Azure.
Preparations before demo
Before nosotros outset with our exercise, we will demand to have the post-obit prerequisites:
- You need to have an active Azure Subscription. If you don't take it, you lot tin create information technology here
- Azure Databricks – You lot demand to gear up both Databricks service and cluster in Azure, you can go over the steps in this article, A beginner's guide to Azure Databricks to create these for yous. As shown in this article, nosotros take created a Databricks service named "azdatabricks" and Databricks cluster named "azdatabrickscluster"
- Azure SQL Database – Creating a SQL Database on Azure is a direct-frontward process. I have put out screenshots below to throw a quick idea on how to create a SQL Database on Azure
On the Azure portal, yous tin can either straight click on Create a resource push button or SQL databases on the left vertical menu bar to land on the Create SQL Database screen.
Provide details like Database name, its configuration, and create or select the Server proper name. Click on the Review + create button to create this SQL database on Azure.
Cheque out this official documentation by Microsoft, Create an Azure SQL Database, where the process to create a SQL database is described in swell detail.
Uploading a CSV file on Azure Databricks Cluster
We will be loading a CSV file (semi-structured data) in the Azure SQL Database from Databricks. For the aforementioned reason, let'due south quickly upload a CSV file on the Databricks portal. You can download it from here. Click on the Data icon on the left vertical carte bar and select the Add Data push.
Browse and choose the file that you want to upload on Azure Databricks.
Once uploaded, you lot can meet the file "1000 Sales Records.csv" being uploaded on the Azure Databricks service. Take a note of the path name of the file: /FileStore/tables/1000_Sales_Records-d540d.csv. We volition employ this path in notebooks to read data.
Load data into Azure SQL Database from Azure Databricks using Scala
Striking on the Create push button and select Notebook on the Workspace icon to create a Notebook.
Type in a Name for the notebook and select Scala as the language. The Cluster name is cocky-populated as there was simply one cluster created, in example yous have more than clusters, y'all tin can e'er select from the drop-downwardly list of your clusters. Finally, click Create to create a Scala notebook.
We will outset by typing in the lawmaking, as shown in the following screenshot. Let's break this chunk of code in small parts and effort to sympathise.
In the beneath code, we volition first create the JDBC URL, which contains information like SQL Server, SQL Database proper noun on Azure, along with other details like Port number, user, and password.
val url = "jdbc:sqlserver://azsqlshackserver.database.windows.internet:1433;database=azsqlshackdb;user=gauri;password=*******" |
Adjacent, nosotros will create a Properties() to link the parameters.
import java . util . Properties val myproperties = new Backdrop ( ) myproperties . put ( "user" , "gauri" ) myproperties . put ( "password" , "******" ) |
The following lawmaking helps to check the connectivity to the SQL Server Database.
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver" myproperties . setProperty ( "Driver" , driverClass ) |
Lastly, we will read the CSV file into mydf data frame. With header = true option, the columns in the start row in the CSV file will be treated as the data frame's columns names. Using inferSchema = true, we are telling Spark to automatically infer the schema of each column.
val mydf = spark . read . format ( "csv" ) . selection ( "header" , "true" ) . option ( "inferSchema" , "true" ) . load ( "/FileStore/tables/1000_Sales_Records-d540d.csv" ) |
We volition utilise the display() function to prove records of the mydf data frame.
Transforming the data
Now, allow'south effort to exercise some quick data munging on the dataset, we will transform the column SalesChannel -> SalesPlatform using withColumnRenamed() office.
val transformedmydf = mydf . withColumnRenamed ( "SalesChannel" , "SalesPlatform" ) display ( transformedmydf ) |
Before we load the transformed data into the Azure SQL Database, let'southward quickly take a peek at the database on the Azure portal. For this go to the portal, and select the SQL database, click on the Query editor (preview),
And provide your Login and Password to query the SQL database on Azure. Click OK.
The below screenshot shows that currently, there are no tables, no information in this database.
Loading the processed information into Azure SQL Database using Scala
On the Azure Databricks portal, execute the below lawmaking. This will load the CSV file into a table named SalesTotalProfit in the SQL Database on Azure.
Transformedmydf . write . jdbc ( url , "SalesTotalProfit" , myproperties ) |
Head back to the Azure portal, refresh the window and execute the below query to select records from the SalesTotalProfit table.
SELECT * FROM [ dbo ] . [ SalesTotalProfit ] |
The data is loaded into the table, SalesTotalProfit in the database, azsqlshackdb on Azure. And yous can perform any operations on the data, as you would do in any regular database.
UPDATE [ dbo ] . [ SalesTotalProfit ] Prepare ItemType = 'Clothing' WHERE ItemType = 'Dress' SELECT * FROM [ dbo ] . [ SalesTotalProfit ] |
The following code reads information from the SalesTotalProfit table in the Databricks. Here, we are processing and accumulation the data per Region and displaying the results.
val azsqldbtable = spark . read . jdbc ( url , "SalesTotalProfit" , myproperties ) display ( azsqldbtable . select ( "Region" , "TotalProfit" ) . groupBy ( "Region" ) . avg ( "TotalProfit" ) |
Load data into Azure SQL Database from Azure Databricks using Python
Let's create a new notebook for Python sit-in. But select Python as the language option when y'all are creating this notebook. We will name this book as loadintoazsqldb.
The following lawmaking sets various parameters like Server name, database proper name, user, and countersign.
jdbcHostname = "azsqlshackserver.database.windows.net" jdbcPort = "1433" jdbcDatabase = "azsqlshackdb" properties = { "user" : "gauri" , "countersign" : "******" } |
The beneath lawmaking creates a JDBC URL. We will use sqlContext() to read the csv file and mydf information frame is created as shown in the screenshot below.
url = "jdbc:sqlserver://{0}:{one};database={2}" . format ( jdbcHostname , jdbcPort , jdbcDatabase ) mydf = sqlContext . read . csv ( "/FileStore/tables/1000_Sales_Records-d540d.csv" , header = Truthful ) |
We will import the pandas library and using the DataFrameWriter function; we will load CSV data into a new dataframe named myfinaldf. And finally, write this data frame into the table TotalProfit for the given backdrop. In case, this tabular array exists, we can overwrite information technology using the style every bit overwrite.
from pyspark . sql import * import pandas every bit pd myfinaldf = DataFrameWriter ( mydf ) myfinaldf . jdbc ( url = url , table = "TotalProfit" , mode = "overwrite" , backdrop = properties ) |
Go to Azure Portal, navigate to the SQL database, and open Query Editor. Open the Tables folder to see the CSV data successfully loaded into the table TotalProfit in the Azure SQL database, azsqlshackdb.
Conclusion
Azure Databricks, a fast and collaborative Apache Spark-based analytics service, integrates seamlessly with a number of Azure Services, including Azure SQL Database. In this article, we demonstrated step-by-footstep processes to populate SQL Database from Databricks using both Scala and Python notebooks.
- Author
- Contempo Posts
Source: https://www.sqlshack.com/load-data-into-azure-sql-database-from-azure-databricks/
0 Response to "Loading Data Into Sas Visual Analytics Sas Visual Analytics New File Upload Window"
Post a Comment