SQL vs pandas
Posted on Thu 22 December 2016 in Projects
SQL vs. pandas¶
- SQL for relational databases, pandas for Dataframes
- mutability.
- debugging: pandas
- pandas: experimental.
- SQL: industry standard(?)
# Standard slicing notations apply
df.loc[]
df.iloc[]
df.groupby()
SQL: Structured Query Language¶
Language used to pull data from Relational databases like:
- postgres
- MySQL
OracleDB
"host our data in MySQL" (so, SQL is assumed)
- run SQL commands, queries
- Customers --> CustomerID, Emails --> EmailID (singular by convention)
https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all
https://www.w3schools.com/sql/default.asp
- relational (80%): tables and indices to orgainze and store
- non-relational (20%): (no tables)
- schema: how to store data
- relational: each table has unique identifier, ID, and is connected to others.
- fields/attributes: FN, LN, age, email, etc. (column names)
- crows foot notation --<
- non-relational database example: MongoDB, NoSQL
4 Operations: GET, UPDATE, INSERT, DESTROY
Many Keywords, for example DELETE FROM
Sample GET Query
SELECT * FROM Customers WHERE Country = 'Germany' ORDER BY PostalCode DESC
SELECT * FROM Products
WHERE Price >= 50
ORDER BY ProductName DESC;
Sample UPDATE Query
UPDATE (table name)
SET (column name) = (value)
WHERE (column name) = (value)
ex.
UPDATE Products
SET Unit = '16 lb pkgs.'
WHERE ProductID = 18
UPDATE OrderDetails
SET Quantity = 12
WHERE OrderID = 10248
^ OrderID isn't native to OrderDetails table
Sample INSERT Query
INSERT INTO (table name)
(column 1 name), (column 2 name), (column 3 name)
VALUES (value 1, value 2, value 3)
INSERT INTO OrderDetails
(OrderID, ProductID, Quantity)
VALUES
(1,2,3),
(4,5,6)
^ add a 2nd row
^ database auto-increments native ID
Sample DESTROY Query
DELETE FROM (table name)
WHERE (condition)
DELETE FROM OrderDetails
WHERE OrderID = 10248
^ (must delete entire row)
^ (table does not update all IDs to fill gaps)
^ (foreign keys don't get deleted)
- Do SELECT queries first, before UPDATING or DESTROYING, so you don't mess up
- run query locally before on production database (!!)
Joins¶
(same or similar to pandas merges)
Sample INNER Join (merge related tables at a shared column into a single table)
SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID
Sample LEFT Join (return all rows from left table, and matching records from the right table) This returns all customers and any OrderIDs that may by associated. If that customer didn't have an OrderID, null is inserted for OrderID
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;
Sample RIGHT Join (return all rows from the right table, and matching records from the left table) This returns all employees and any orders they may have placed.
SELECT Orders.OrderID, Employees.LastName, Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;
Sample OUTER Join to combine all rows from one or more tables. no rows will be left out, all will be included in the joined tables.
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;