DS Notebook 02

Posted on Wed 30 May 2018 in Projects

Using vectorization, ex. for efficient kNN calculation:

kNN

ufuncs in NumPy (vectorized expressions instead of loops)

In [281]:
import numpy as np
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

values = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(values)
3.51 s ± 124 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [2]:
%timeit 1.0 / values
8.4 ms ± 988 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

so, ufuncs much faster, since there is no type-checking. In a compiled language, that would've been taken care of already.

works with multi-dimensional arrays as well

In [3]:
x = np.arange(9).reshape((3,3))
x
Out[3]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
In [4]:
x ** 2
Out[4]:
array([[ 0,  1,  4],
       [ 9, 16, 25],
       [36, 49, 64]], dtype=int32)

The following table lists the arithmetic operators implemented in NumPy:

Operator Equivalent ufunc Description
+ np.add Addition (e.g., 1 + 1 = 2)
- np.subtract Subtraction (e.g., 3 - 2 = 1)
- np.negative Unary negation (e.g., -2)
* np.multiply Multiplication (e.g., 2 * 3 = 6)
/ np.divide Division (e.g., 3 / 2 = 1.5)
// np.floor_divide Floor division (e.g., 3 // 2 = 1)
** np.power Exponentiation (e.g., 2 ** 3 = 8)
% np.mod Modulus/remainder (e.g., 9 % 4 = 1)

Additionally there are Boolean/bitwise operators; we will explore these in Comparisons, Masks, and Boolean Logic.

Aggregations

Function Name NaN-safe Version Description
np.sum np.nansum Compute sum of elements
np.prod np.nanprod Compute product of elements
np.mean np.nanmean Compute mean of elements
np.std np.nanstd Compute standard deviation
np.var np.nanvar Compute variance
np.min np.nanmin Find minimum value
np.max np.nanmax Find maximum value
np.argmin np.nanargmin Find index of minimum value
np.argmax np.nanargmax Find index of maximum value
np.median np.nanmedian Compute median of elements
np.percentile np.nanpercentile Compute rank-based statistics of elements
np.any N/A Evaluate whether any elements are true
np.all N/A Evaluate whether all elements are true

Computation on Arrays: Broadcasting

In [5]:
M = np.ones((3,3))
In [6]:
a = [1,2,3]
In [7]:
M + a
Out[7]:
array([[2., 3., 4.],
       [2., 3., 4.],
       [2., 3., 4.]])
In [8]:
a = np.arange(3)
b = np.arange(3).reshape((3,1))

print(a)
print(b)
[0 1 2]
[[0]
 [1]
 [2]]
In [9]:
a + b
Out[9]:
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])
  • Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
  • Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
  • Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.
In [10]:
X = np.random.random((10,3))
In [11]:
Xmean = X.mean(0)
Xmean
Out[11]:
array([0.46110098, 0.49893467, 0.36013246])
In [12]:
X_centered = X - Xmean
X_centered
Out[12]:
array([[ 0.14736296,  0.21900803,  0.07023161],
       [-0.20869088,  0.4767899 , -0.24036926],
       [-0.2525321 , -0.25910342,  0.03020778],
       [ 0.15208766,  0.36143593, -0.0136786 ],
       [-0.16716128, -0.10618668,  0.33431049],
       [ 0.25124886,  0.14288258, -0.21493999],
       [ 0.45686835, -0.47092551,  0.17158316],
       [-0.11609482,  0.30802404, -0.28857307],
       [-0.25315294, -0.25342349,  0.41424483],
       [-0.0099358 , -0.41850138, -0.26301694]])
In [13]:
X_centered.mean(0)
Out[13]:
array([ 4.44089210e-17, -6.66133815e-17, -3.33066907e-17])
In [14]:
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 50).reshape(50, 1)
In [15]:
# use broadcasting to compute z across the grid:
z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
In [16]:
import matplotlib.pyplot as plt
In [17]:
plt.imshow(z, origin='lower', extent=[0, 5, 0, 5], cmap='viridis')
plt.colorbar();

Comparisons, Masks, Boolean Logic

Extract, modify, or otherwise manipulate values in an array based on some criterion. For example, count all values greater than a certain value, or remove all outliers above a threhold. In NumPy, Boolean masking is often most efficient.

In [18]:
x = np.array([1, 2, 3, 4, 5])

The result of ufunc comparison operators is always an array with a Boolean data type. <, >, <=, >=, !=, == are all available

In [19]:
x > 3
Out[19]:
array([False, False, False,  True,  True])
In [20]:
(2 * x) == (x ** 2)
Out[20]:
array([False,  True, False, False, False])
In [21]:
rng = np.random.RandomState(0)
x = rng.randint(10, size=(3, 4))
x
Out[21]:
array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])
In [22]:
x < 6
Out[22]:
array([[ True,  True,  True,  True],
       [False, False,  True,  True],
       [ True,  True, False, False]])
In [23]:
np.sum(x < 6)
Out[23]:
8

^ False == 0 and True == 1 is used here

In [24]:
# how many values less than 6 in each row?
np.sum(x < 6, axis=1)
Out[24]:
array([4, 2, 2])

Can also use np.any(), np.all(), np.sum(), but these are different from built-in Python any(), all(), sum(), so be careful.

In [25]:
import pandas as pd
In [26]:
rainfall = pd.read_csv('Seattle2014.csv')['PRCP'].values
inches = rainfall / 254.0  # 1/10mm -> inches
inches.shape
Out[26]:
(365,)
In [27]:
np.sum((inches > 0.5) & (inches < 1))
Out[27]:
29
Operator Equivalent ufunc Operator Equivalent ufunc
& np.bitwise_and | np.bitwise_or
^ np.bitwise_xor ~ np.bitwise_not
In [28]:
print("Number days without rain:      ", np.sum(inches == 0))
print("Number days with rain:         ", np.sum(inches != 0))
print("Days with more than 0.5 inches:", np.sum(inches > 0.5))
print("Rainy days with < 0.2 inches  :", np.sum((inches > 0) &
                                                (inches < 0.2)))
Number days without rain:       215
Number days with rain:          150
Days with more than 0.5 inches: 37
Rainy days with < 0.2 inches  : 75
In [29]:
x
Out[29]:
array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])
In [30]:
x < 5
Out[30]:
array([[False,  True,  True,  True],
       [False, False,  True, False],
       [ True,  True, False, False]])
In [31]:
# masking operation to select from array
x[x < 5]
Out[31]:
array([0, 3, 3, 3, 2, 4])
In [32]:
# construct a mask of all rainy days
rainy = (inches > 0)

# construct a mask of all summer days (June 21st is the 172nd day)
days = np.arange(365)
summer = (days > 172) & (days < 262)

print("Median precip on rainy days in 2014 (inches):   ",
      np.median(inches[rainy]))
print("Median precip on summer days in 2014 (inches):  ",
      np.median(inches[summer]))
print("Maximum precip on summer days in 2014 (inches): ",
      np.max(inches[summer]))
print("Median precip on non-summer rainy days (inches):",
      np.median(inches[rainy & ~summer]))
Median precip on rainy days in 2014 (inches):    0.19488188976377951
Median precip on summer days in 2014 (inches):   0.0
Maximum precip on summer days in 2014 (inches):  0.8503937007874016
Median precip on non-summer rainy days (inches): 0.20078740157480315

One common point of confusion is the difference between the keywords and and or on one hand, and the operators & and | on the other hand. When would you use one versus the other?

The difference is this: and and or gauge the truth or falsehood of entire object, while & and | refer to bits within each object.

When you use and or or, it's equivalent to asking Python to treat the object as a single Boolean entity. In Python, all nonzero integers will evaluate as True.

In [33]:
bool(42), bool(0)
Out[33]:
(True, False)
In [34]:
bool(42 and 0)
Out[34]:
False
In [35]:
bool(42 or 0)
Out[35]:
True

Notice that the corresponding bits of the binary representation are compared in order to yield the result. When you use & and | on integers, the expression operates on the bits of the element, applying the and or the or to the individual bits making up the number:

In [36]:
# bin() is binary representation of the number
bin(42)
Out[36]:
'0b101010'
In [37]:
bin(59)
Out[37]:
'0b111011'
In [38]:
bin(42 & 59)
Out[38]:
'0b101010'
In [39]:
A = np.array([1, 0, 1, 0, 1, 0], dtype=bool)
B = np.array([1, 1, 1, 0, 1, 1], dtype=bool)
A | B
Out[39]:
array([ True,  True,  True, False,  True,  True])
In [40]:
A or B
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-40-ea2c97d9d9ee> in <module>()
----> 1 A or B

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
In [41]:
x = np.arange(10)
(x > 4) & (x < 8)
Out[41]:
array([False, False, False, False, False,  True,  True,  True, False,
       False])
In [42]:
(x > 4) and (x < 8)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-eecf1fdd5fb4> in <module>()
----> 1 (x > 4) and (x < 8)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

So remember this: and and or perform a single Boolean evaluation on an entire object, while & and | perform multiple Boolean evaluations on the content (the individual bits or bytes) of an object.

For Boolean NumPy arrays, the latter is nearly always the desired operation.

Fancy Indexing

how to access and modify portions of arrays using:

  • simple indices arr[0]
  • slices arr[:5]
  • Boolean masks arr[arr > 0]

Fancy indexing is just passing an array of indices to access multiple array elements at once:

In [ ]:
rand = np.random.RandomState(42)

x = rand.randint(100, size=10)
print(x)
In [ ]:
[x[3], x[7], x[2]]
In [ ]:
ind = [3, 7, 2]
x[ind]
In [ ]:
ind = np.array([[3, 7],
                [2, 6]])
x[ind]

It is always important to remember with fancy indexing that the return value reflects the broadcasted shape of the indices, rather than the shape of the array being indexed.

In [52]:
x = np.zeros(10)
x[[0, 0]] = [4, 6]
x
Out[52]:
array([6., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
In [53]:
i = [3, 3, 3]
x[i] += 1   # this *assignment* happens 3 times, but not the augmentation
x
Out[53]:
array([6., 0., 0., 1., 0., 0., 0., 0., 0., 0.])

to get the augmentation to happen multiple times, use at():

In [55]:
np.add.at(x, i, 1)
In [56]:
x
Out[56]:
array([6., 0., 0., 4., 0., 0., 0., 0., 0., 0.])

np.add.reduceat() is similarly useful..

Efficient manual histogram with searchsorted() and at()

In [62]:
np.random.seed(42)
x = np.random.randn(100)
x
Out[62]:
array([ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986, -0.23415337,
       -0.23413696,  1.57921282,  0.76743473, -0.46947439,  0.54256004,
       -0.46341769, -0.46572975,  0.24196227, -1.91328024, -1.72491783,
       -0.56228753, -1.01283112,  0.31424733, -0.90802408, -1.4123037 ,
        1.46564877, -0.2257763 ,  0.0675282 , -1.42474819, -0.54438272,
        0.11092259, -1.15099358,  0.37569802, -0.60063869, -0.29169375,
       -0.60170661,  1.85227818, -0.01349722, -1.05771093,  0.82254491,
       -1.22084365,  0.2088636 , -1.95967012, -1.32818605,  0.19686124,
        0.73846658,  0.17136828, -0.11564828, -0.3011037 , -1.47852199,
       -0.71984421, -0.46063877,  1.05712223,  0.34361829, -1.76304016,
        0.32408397, -0.38508228, -0.676922  ,  0.61167629,  1.03099952,
        0.93128012, -0.83921752, -0.30921238,  0.33126343,  0.97554513,
       -0.47917424, -0.18565898, -1.10633497, -1.19620662,  0.81252582,
        1.35624003, -0.07201012,  1.0035329 ,  0.36163603, -0.64511975,
        0.36139561,  1.53803657, -0.03582604,  1.56464366, -2.6197451 ,
        0.8219025 ,  0.08704707, -0.29900735,  0.09176078, -1.98756891,
       -0.21967189,  0.35711257,  1.47789404, -0.51827022, -0.8084936 ,
       -0.50175704,  0.91540212,  0.32875111, -0.5297602 ,  0.51326743,
        0.09707755,  0.96864499, -0.70205309, -0.32766215, -0.39210815,
       -1.46351495,  0.29612028,  0.26105527,  0.00511346, -0.23458713])
In [58]:
bins = np.linspace(-5, 5, 20)
bins
Out[58]:
array([-5.        , -4.47368421, -3.94736842, -3.42105263, -2.89473684,
       -2.36842105, -1.84210526, -1.31578947, -0.78947368, -0.26315789,
        0.26315789,  0.78947368,  1.31578947,  1.84210526,  2.36842105,
        2.89473684,  3.42105263,  3.94736842,  4.47368421,  5.        ])
In [67]:
counts = np.zeros_like(bins)
counts
Out[67]:
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])
In [65]:
i = np.searchsorted(bins, x)
i
Out[65]:
array([11, 10, 11, 13, 10, 10, 13, 11,  9, 11,  9,  9, 10,  6,  7,  9,  8,
       11,  8,  7, 13, 10, 10,  7,  9, 10,  8, 11,  9,  9,  9, 14, 10,  8,
       12,  8, 10,  6,  7, 10, 11, 10, 10,  9,  7,  9,  9, 12, 11,  7, 11,
        9,  9, 11, 12, 12,  8,  9, 11, 12,  9, 10,  8,  8, 12, 13, 10, 12,
       11,  9, 11, 13, 10, 13,  5, 12, 10,  9, 10,  6, 10, 11, 13,  9,  8,
        9, 12, 11,  9, 11, 10, 12,  9,  9,  9,  7, 11, 10, 10, 10],
      dtype=int64)
In [68]:
np.add.at(counts, i, 1)
counts
Out[68]:
array([ 0.,  0.,  0.,  0.,  0.,  1.,  3.,  7.,  9., 23., 22., 17., 10.,
        7.,  1.,  0.,  0.,  0.,  0.,  0.])
In [69]:
bins
Out[69]:
array([-5.        , -4.47368421, -3.94736842, -3.42105263, -2.89473684,
       -2.36842105, -1.84210526, -1.31578947, -0.78947368, -0.26315789,
        0.26315789,  0.78947368,  1.31578947,  1.84210526,  2.36842105,
        2.89473684,  3.42105263,  3.94736842,  4.47368421,  5.        ])
In [86]:
import matplotlib.pyplot as plt
import seaborn; seaborn.set()  # for plot styling
In [87]:
plt.plot(bins, counts, drawstyle='steps');

Sorting and Big-O

In [88]:
def selection_sort(x):
    for i in range(len(x)):
        swap = i + np.argmin(x[i:])
        (x[i], x[swap]) = (x[swap], x[i])
    return x
In [104]:
x = np.array([2, 1, 4, 3, 5])
selection_sort(x)
Out[104]:
array([1, 2, 3, 4, 5])

$N$ loops: for i in range(len(x)):
$N$ comparisons: np.argmin()
So, this sort function is slow, $\mathcal{O}[N^2]$

By default, NumPy's np.sort() function uses an $\mathcal{O}[N\log N]$ quicksort algorithm:

In [108]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)
Out[108]:
array([1, 2, 3, 4, 5])
In [109]:
x #x is unaffected
Out[109]:
array([2, 1, 4, 3, 5])
In [110]:
# sort in-place using .sort() method:
x.sort()
In [111]:
x
Out[111]:
array([1, 2, 3, 4, 5])
In [115]:
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
i
Out[115]:
array([1, 0, 3, 2, 4], dtype=int64)
In [118]:
x[i]
Out[118]:
array([1, 2, 3, 4, 5])
In [119]:
rand = np.random.RandomState(42)
X = rand.randint(0, 10, (4, 6))
print(X)
[[6 3 7 4 6 9]
 [2 6 7 4 3 7]
 [7 2 5 4 1 7]
 [5 1 4 0 9 5]]
In [120]:
np.sort(X, axis=0) # sort each column
Out[120]:
array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]])
In [122]:
np.sort(X, axis=1) # sort each row
Out[122]:
array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]])
In [123]:
x = np.array([7, 2, 3, 1, 6, 5, 4])
np.partition(x, 3)
Out[123]:
array([2, 1, 3, 4, 6, 5, 7])
In [124]:
np.partition(X, 2, axis=1)
Out[124]:
array([[3, 4, 6, 7, 6, 9],
       [2, 3, 4, 7, 6, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 9, 5]])

The result is an array where the first two slots in each row contain the smallest values from that row, with the remaining values filling the remaining slots.
Finally, just as there is a np.argsort that computes indices of the sort, there is a np.argpartition that computes indices of the partition:

In [126]:
np.argpartition(X, 2)
Out[126]:
array([[1, 3, 0, 2, 4, 5],
       [0, 4, 3, 2, 1, 5],
       [4, 1, 3, 2, 0, 5],
       [3, 1, 2, 0, 4, 5]], dtype=int64)

Using np.newaxis to promote array to a higher dimension

In [163]:
a = np.arange(4)
a
Out[163]:
array([0, 1, 2, 3])
In [164]:
a.shape
Out[164]:
(4,)
In [168]:
row_vec = a[np.newaxis, :]
row_vec.shape
Out[168]:
(1, 4)
In [170]:
col_vec = a[:, np.newaxis]
col_vec.shape
Out[170]:
(4, 1)

Example: kNN

In [262]:
X = rand.rand(200, 2)
In [263]:
plt.scatter(X[:, 0], X[:, 1], s=50);

Given $(x_1, y_1)$, $(x_2, y_2)$, ... $(x_n, y_n)$, first compute:
$x_1 - x_1$
$x_1 - x_2$
...
$x_n - x_n$

and

$y_1 - y_1$
$y_1 - y_2$
...
$y_1 - y_3$

In [264]:
differences = X[:, np.newaxis, :] - X[np.newaxis, :, :]
differences.shape
Out[264]:
(200, 200, 2)

then square them.. $(x_1 - x_i)^2$ and $(y_1 - y_i)^2$:

In [265]:
sq_differences = differences ** 2
sq_differences.shape
Out[265]:
(200, 200, 2)

then add the squared coordinate differences, so we have $(x_1 - x_i)^2 + (y_1 - y_i)^2$

In [266]:
dist_sq = sq_differences.sum(-1)
dist_sq.shape
Out[266]:
(200, 200)
In [267]:
dist_sq.diagonal()
Out[267]:
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
In [268]:
nearest = np.argsort(dist_sq, axis=1)
print(nearest)
[[  0  75  79 ... 155  38  64]
 [  1 194  68 ...  91   3 180]
 [  2 127 141 ...  98 124   9]
 ...
 [197  99 119 ... 194 122  41]
 [198 178 121 ... 122  64  41]
 [199  51   2 ...  98 124   9]]
In [269]:
K = 3
nearest_partition = np.argpartition(dist_sq, K + 1, axis=1)
In [270]:
plt.scatter(X[:, 0], X[:, 1], s=50)

# draw lines from each point to its two nearest neighbors
K = 3

for i in range(X.shape[0]):
    for j in nearest_partition[i, :K+1]:
        # plot a line from X[i] to X[j]
        # use some zip magic to make it happen:
        plt.plot(*zip(X[j], X[i]), color='black', lw=1)

Structured Data

In [271]:
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)
[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]

Here 'U10' translates to "Unicode string of maximum length 10," 'i4' translates to "4-byte (i.e., 32 bit) integer," and 'f8' translates to "8-byte (i.e., 64 bit) float."

In [272]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]
In [273]:
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)
[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]
In [274]:
data['name']
Out[274]:
array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')
In [275]:
data[0]
Out[275]:
('Alice', 25, 55.)
In [279]:
data[-1][['name', 'age']]
Out[279]:
('Doug', 19)
In [280]:
data[data['age'] < 30]['name']
Out[280]:
array(['Alice', 'Doug'], dtype='<U10')

The shortened string format codes may seem confusing, but they are built on simple principles. The first (optional) character is < or >, which means "little endian" or "big endian," respectively, and specifies the ordering convention for significant bits. The next character specifies the type of data: characters, bytes, ints, floating points, and so on (see the table below). The last character or characters represents the size of the object in bytes.

Character Description Example
'b' Byte np.dtype('b')
'i' Signed integer np.dtype('i4') == np.int32
'u' Unsigned integer np.dtype('u1') == np.uint8
'f' Floating point np.dtype('f8') == np.int64
'c' Complex floating point np.dtype('c16') == np.complex128
'S', 'a' String np.dtype('S5')
'U' Unicode string np.dtype('U') == np.str_
'V' Raw data (void) np.dtype('V') == np.void
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: