SQL vs pandas

Posted on Thu 22 December 2016 in Projects

SQL vs. pandas

  • SQL for relational databases, pandas for Dataframes
  • mutability.
  • debugging: pandas
  • pandas: experimental.
  • SQL: industry standard(?)

Syntax

Pandas

In [ ]:
# Standard slicing notations apply
df.loc[]
df.iloc[]
df.groupby()

SQL: Structured Query Language

Language used to pull data from Relational databases like:

  • postgres
  • MySQL
  • OracleDB

  • "host our data in MySQL" (so, SQL is assumed)

  • run SQL commands, queries
  • Customers --> CustomerID, Emails --> EmailID (singular by convention)

https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all
https://www.w3schools.com/sql/default.asp

  • relational (80%): tables and indices to orgainze and store
  • non-relational (20%): (no tables)
  • schema: how to store data
  • relational: each table has unique identifier, ID, and is connected to others.
  • fields/attributes: FN, LN, age, email, etc. (column names)
  • crows foot notation --<
  • non-relational database example: MongoDB, NoSQL

4 Operations: GET, UPDATE, INSERT, DESTROY

Many Keywords, for example DELETE FROM

Sample GET Query

SELECT * FROM Customers WHERE Country = 'Germany' ORDER BY PostalCode DESC
SELECT * FROM Products
WHERE Price >= 50
ORDER BY ProductName DESC;

Sample UPDATE Query

UPDATE (table name)
SET (column name) = (value)
WHERE (column name) = (value)

ex.

UPDATE Products
SET Unit = '16 lb pkgs.'
WHERE ProductID = 18
UPDATE OrderDetails
SET Quantity = 12
WHERE OrderID = 10248

^ OrderID isn't native to OrderDetails table

Sample INSERT Query

INSERT INTO (table name)
(column 1 name), (column 2 name), (column 3 name)
VALUES (value 1, value 2, value 3)
INSERT INTO OrderDetails
(OrderID, ProductID, Quantity)
VALUES
(1,2,3),  
(4,5,6)

^ add a 2nd row
^ database auto-increments native ID

Sample DESTROY Query

DELETE FROM (table name)
WHERE (condition)
DELETE FROM OrderDetails
WHERE OrderID = 10248

^ (must delete entire row)
^ (table does not update all IDs to fill gaps)
^ (foreign keys don't get deleted)

  • Do SELECT queries first, before UPDATING or DESTROYING, so you don't mess up
  • run query locally before on production database (!!)

Joins

(same or similar to pandas merges)

Sample INNER Join (merge related tables at a shared column into a single table)

SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID

Sample LEFT Join (return all rows from left table, and matching records from the right table) This returns all customers and any OrderIDs that may by associated. If that customer didn't have an OrderID, null is inserted for OrderID

SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;

Sample RIGHT Join (return all rows from the right table, and matching records from the left table) This returns all employees and any orders they may have placed.

SELECT Orders.OrderID, Employees.LastName, Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;

Sample OUTER Join to combine all rows from one or more tables. no rows will be left out, all will be included in the joined tables.

SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;
In [ ]:
 
In [ ]:
 
In [ ]: