1. Snowflake Basics
Q1: What is Snowflake, and how is it different from traditional databases?
Answer:
Snowflake is a cloud-based data warehouse built for scalability, performance, and ease of use.
Key differences:
Separates storage and compute (unlike traditional databases).
Auto-scaling (no manual sharding or partitioning).
Supports semi-structured data (JSON, Avro, Parquet) natively.
Zero-copy cloning (instantly duplicate databases/tables without extra storage).
Time Travel (recover deleted/modified data).
2. Setting Up Snowflake
Q2: How do you create a database, table, and warehouse in Snowflake?
Answer:
Create a Database
CREATE DATABASE sales_db;
Create a Table
CREATE TABLE customers ( customer_id INT PRIMARY KEY, name VARCHAR(100), email VARCHAR(100), signup_date DATE );
Create a Virtual Warehouse (Compute Cluster)
CREATE WAREHOUSE analytics_wh WITH WAREHOUSE_SIZE = 'X-SMALL' AUTO_SUSPEND = 300 -- Pauses after 5 mins of inactivity AUTO_RESUME = TRUE;
3. Data Loading & Querying
Q3: How do you load CSV and JSON data into Snowflake?
Answer:
Loading CSV Data
-- 1. Create a stage (cloud storage reference) CREATE STAGE my_csv_stage FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1); -- 2. Upload file to stage (via SnowSQL or UI) -- 3. Load into table COPY INTO customers FROM @my_csv_stage/customers.csv;
Loading JSON Data
-- 1. Create a table with VARIANT column CREATE TABLE json_data ( raw_data VARIANT ); -- 2. Load JSON file COPY INTO json_data FROM @my_stage/data.json FILE_FORMAT = (TYPE = 'JSON');
Querying JSON Data
SELECT raw_data:customer_id::INT AS customer_id, raw_data:address.city::STRING AS city FROM json_data;
4. Advanced Querying & Optimization
Q4: How do you optimize query performance in Snowflake?
Answer:
Best Practices:
✅ Use clustering keys (for large tables):
ALTER TABLE large_table CLUSTER BY (date_column);
✅ Materialized views (for frequent queries):
CREATE MATERIALIZED VIEW mv_sales_summary AS SELECT product_id, SUM(amount) FROM sales GROUP BY product_id;
✅ Warehouse scaling (for heavy workloads):
ALTER WAREHOUSE analytics_wh SET WAREHOUSE_SIZE = 'LARGE';
✅ Query caching (Snowflake caches results for 24 hours).
5. Data Sharing & Security
Q5: How do you share data securely in Snowflake?
Answer:
Secure Data Sharing (Provider-Consumer Model)
-- Provider creates a share CREATE SHARE sales_data_share; -- Grants access to objects GRANT USAGE ON DATABASE sales_db TO SHARE sales_data_share; GRANT SELECT ON TABLE sales_db.public.customers TO SHARE sales_data_share; -- Adds consumer account ALTER SHARE sales_data_share ADD ACCOUNTS = 'consumer_account';
Row-Level Security (RLS)
-- Create a policy to restrict data access CREATE ROW ACCESS POLICY region_filter AS (region VARCHAR) RETURNS BOOLEAN -> CURRENT_ROLE() = 'ADMIN' OR region = CURRENT_REGION();
6. Time Travel & Cloning
Q6: How do you recover deleted data using Time Travel?
Answer:
Restore a Dropped Table
-- Check dropped tables SHOW TABLES HISTORY LIKE 'customers'; -- Restore the table UNDROP TABLE customers;
Query Historical Data
-- See data from 2 hours ago SELECT * FROM customers AT(OFFSET => -60*60*2); -- Restore to a previous state CREATE TABLE customers_restored AS SELECT * FROM customers BEFORE(TIMESTAMP => '2024-01-01 12:00:00'::TIMESTAMP);
Zero-Copy Cloning
-- Clone a table (no storage cost) CREATE TABLE customers_dev CLONE customers; -- Clone an entire database CREATE DATABASE sales_dev CLONE sales_db;
7. Snowflake Administration
Q7: How do you monitor costs and usage?
Answer:
Check Warehouse Usage
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY ORDER BY START_TIME DESC;
Monitor Storage Costs
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.STORAGE_USAGE;
Set Budget Alerts
-- Use Snowflake's resource monitors CREATE RESOURCE MONITOR budget_monitor WITH CREDIT_QUOTA = 100 TRIGGERS ON 80 PERCENT DO NOTIFY ON 100 PERCENT DO SUSPEND;
No comments:
Post a Comment