Thursday, June 19, 2025

Snowflake

 

1. Snowflake Basics

Q1: What is Snowflake, and how is it different from traditional databases?

Answer:

  • Snowflake is a cloud-based data warehouse built for scalability, performance, and ease of use.

  • Key differences:

    • Separates storage and compute (unlike traditional databases).

    • Auto-scaling (no manual sharding or partitioning).

    • Supports semi-structured data (JSON, Avro, Parquet) natively.

    • Zero-copy cloning (instantly duplicate databases/tables without extra storage).

    • Time Travel (recover deleted/modified data).


2. Setting Up Snowflake

Q2: How do you create a database, table, and warehouse in Snowflake?

Answer:

Create a Database

sql
Copy
Download
CREATE DATABASE sales_db;

Create a Table

sql
Copy
Download
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    signup_date DATE
);

Create a Virtual Warehouse (Compute Cluster)

sql
Copy
Download
CREATE WAREHOUSE analytics_wh
WITH 
    WAREHOUSE_SIZE = 'X-SMALL'
    AUTO_SUSPEND = 300  -- Pauses after 5 mins of inactivity
    AUTO_RESUME = TRUE;

3. Data Loading & Querying

Q3: How do you load CSV and JSON data into Snowflake?

Answer:

Loading CSV Data

sql
Copy
Download
-- 1. Create a stage (cloud storage reference)
CREATE STAGE my_csv_stage
FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1);

-- 2. Upload file to stage (via SnowSQL or UI)
-- 3. Load into table
COPY INTO customers
FROM @my_csv_stage/customers.csv;

Loading JSON Data

sql
Copy
Download
-- 1. Create a table with VARIANT column
CREATE TABLE json_data (
    raw_data VARIANT
);

-- 2. Load JSON file
COPY INTO json_data
FROM @my_stage/data.json
FILE_FORMAT = (TYPE = 'JSON');

Querying JSON Data

sql
Copy
Download
SELECT 
    raw_data:customer_id::INT AS customer_id,
    raw_data:address.city::STRING AS city
FROM json_data;

4. Advanced Querying & Optimization

Q4: How do you optimize query performance in Snowflake?

Answer:

Best Practices:

✅ Use clustering keys (for large tables):

sql
Copy
Download
ALTER TABLE large_table CLUSTER BY (date_column);

✅ Materialized views (for frequent queries):

sql
Copy
Download
CREATE MATERIALIZED VIEW mv_sales_summary AS
SELECT product_id, SUM(amount) 
FROM sales 
GROUP BY product_id;

✅ Warehouse scaling (for heavy workloads):

sql
Copy
Download
ALTER WAREHOUSE analytics_wh SET WAREHOUSE_SIZE = 'LARGE';

✅ Query caching (Snowflake caches results for 24 hours).


5. Data Sharing & Security

Q5: How do you share data securely in Snowflake?

Answer:

Secure Data Sharing (Provider-Consumer Model)

sql
Copy
Download
-- Provider creates a share
CREATE SHARE sales_data_share;

-- Grants access to objects
GRANT USAGE ON DATABASE sales_db TO SHARE sales_data_share;
GRANT SELECT ON TABLE sales_db.public.customers TO SHARE sales_data_share;

-- Adds consumer account
ALTER SHARE sales_data_share ADD ACCOUNTS = 'consumer_account';

Row-Level Security (RLS)

sql
Copy
Download
-- Create a policy to restrict data access
CREATE ROW ACCESS POLICY region_filter
AS (region VARCHAR) RETURNS BOOLEAN ->
    CURRENT_ROLE() = 'ADMIN' OR region = CURRENT_REGION();

6. Time Travel & Cloning

Q6: How do you recover deleted data using Time Travel?

Answer:

Restore a Dropped Table

sql
Copy
Download
-- Check dropped tables
SHOW TABLES HISTORY LIKE 'customers';

-- Restore the table
UNDROP TABLE customers;

Query Historical Data

sql
Copy
Download
-- See data from 2 hours ago
SELECT * FROM customers AT(OFFSET => -60*60*2);

-- Restore to a previous state
CREATE TABLE customers_restored AS
SELECT * FROM customers BEFORE(TIMESTAMP => '2024-01-01 12:00:00'::TIMESTAMP);

Zero-Copy Cloning

sql
Copy
Download
-- Clone a table (no storage cost)
CREATE TABLE customers_dev CLONE customers;

-- Clone an entire database
CREATE DATABASE sales_dev CLONE sales_db;

7. Snowflake Administration

Q7: How do you monitor costs and usage?

Answer:

Check Warehouse Usage

sql
Copy
Download
SELECT * 
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
ORDER BY START_TIME DESC;

Monitor Storage Costs

sql
Copy
Download
SELECT * 
FROM SNOWFLAKE.ACCOUNT_USAGE.STORAGE_USAGE;

Set Budget Alerts

sql
Copy
Download
-- Use Snowflake's resource monitors
CREATE RESOURCE MONITOR budget_monitor
WITH CREDIT_QUOTA = 100
TRIGGERS 
    ON 80 PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND;

No comments:

Post a Comment

Python using AI

  Python using AI - Prompts & Codes Tools useful for Python + AI ChatGPT - https://chatgpt.com/ Claude AI - https://claude.ai/new ...