numpy

Numpy: Filtering by multiple conditions

Apart from using np.where() a different way to filter a numpy array is to use the following syntax :

ary[ condition ]

for example :

ary[ ary > 10 ]

So far so good, but how do you apply multiple conditions. To do this you need to surround all the conditions in brackets as shown below. Also for logical operators you should use the bitwise operators instead.

import numpy as np

a = np.arange(5,15)

print(a)

#one condition
print(a[a > 10])
#multiple conditions
print(a[(a > 10) & (a < 13)])
print(a[(a == 10) | (a == 13)])

-----

[ 5  6  7  8  9 10 11 12 13 14]
[11 12 13 14]
[11 12]
[10 13]

Numpy : min/max of integer types

Here is how to find the MIN or MAX values for the Integer types in numpy :

In [235]: np.iinfo(np.int8)                                                                                                                                                  
Out[235]: iinfo(min=-128, max=127, dtype=int8)

In [236]: np.iinfo(np.int16)                                                                                                                                                 
Out[236]: iinfo(min=-32768, max=32767, dtype=int16)

In [237]: np.iinfo(np.int32)                                                                                                                                                 
Out[237]: iinfo(min=-2147483648, max=2147483647, dtype=int32)

In [238]: np.iinfo(np.int64)                                                                                                                                                 
Out[238]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

In [239]: np.iinfo(np.int)                                                                                                                                                   
Out[239]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

Stats functions

Quick list of statistics Functions :

import numpy as np

class stats():

	"""
		Calculate the deviations in a sample.
	"""
	@staticmethod
	def dev(xs): return xs - np.mean(xs)

	"""
		Calculate the covariance between two data sets.
			sample=True : sample covarince
			sample=False: population covariance
	"""
	@staticmethod
	def cov(xs,ys,sample=True):
		dec = 1 if sample else 0 #if sample-cov decrement len by 1
		#sum of products of deviations
		return np.dot( stats.dev(xs), stats.dev(ys) ) / (len(xs) - dec)

	@staticmethod
	def var(xs,sample=True):
		dec = 1 if sample else 0 #if sample-cov decrement len by 1
		return np.sum(stats.dev(xs)**2) / (len(xs) - dec)

	@staticmethod
	def std_dev(xs,sample=True):
		return np.sqrt( stats.var(xs, sample) )

	"""
		Calculate Pearson correlation.
	"""
	@staticmethod
	def corr(xs,ys,sample=True) :
		varx = stats.var(xs)
		vary = stats.var(ys)

		corr = stats.cov(xs,ys,sample)/ np.sqrt(varx * vary)
		return corr

	@staticmethod
	def rank(xs): return np.argsort(np.argsort(xs))
	@staticmethod
	def spearman_corr(xs,ys,sample=True):
		xranks = stats.rank(xs)
		yranks = stats.rank(ys)
		return stats.corr(xranks,yranks,sample)

	@staticmethod
	def r2(ys,residuals): #coef of determination
		return ( 1 - stats.var(residuals) ) / stats.var(ys)

	"""
		Calculate auto correlation coefficients for a time series or list of values.
			lag : specifies up to how many lagging coef will be calculated.
				!! The lag should be at most the lenght of the data minus 2, we skip lag-zero.
	"""
	@staticmethod
	def auto_corr(xs,lag=1,sample=True):
		if lag > len(xs) - 2 : raise Exception("Lag(%s) is bigger than the data-len(%s) - 2" % (lag,len(xs)))
		ac = np.zeros(lag)
		for i in range(1, lag+1) :
			ac[i-1] = stats.corr(xs[:-i], xs[i:], sample)
		return ac

Pretty printing 2D python/numpy array

Below I show you quick and dirty way to print 2D array Column and Row labels/indexes. It is often more convenient to have those available so you can easily track visually the results of operations.

First lets try with numpy array :

import numpy as np
import pandas as pd

a = np.random.randint(0,100,(5,5))

print(a)

print()
print(pd.DataFrame(a))


print()
print(pd.DataFrame(a,columns=['A','B','C','D','E']))
[[70 40 64 22 91]
 [82 41 35 42 19]
 [21  7 42 63 85]
 [26 43 23  1 34]
 [44 79 88 46 62]]

    0   1   2   3   4
0  70  40  64  22  91
1  82  41  35  42  19
2  21   7  42  63  85
3  26  43  23   1  34
4  44  79  88  46  62

    A   B   C   D   E
0  70  40  64  22  91
1  82  41  35  42  19
2  21   7  42  63  85
3  26  43  23   1  34
4  44  79  88  46  62

Of course it is similar for normal Python arrays :

import numpy as np
import pandas as pd

b = [[1,2],[3,4]]

print()
print(pd.DataFrame(b,columns=['A','B']))
   A  B
0  1  2
1  3  4

here if you are too lazy to type : https://onecompiler.com/python/3xm3ms6fb

Calculating Moving (Average) ++

If you have time series data and want to calculate Moving-function like Moving Average you can use a Rolling window like shown below … enjoy

import numpy as np

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.arange(0,10)

print(rolling_window(a,5))

print(np.mean(rolling_window(a,5), axis=1))

-----
[[0 1 2 3 4]
 [1 2 3 4 5]
 [2 3 4 5 6]
 [3 4 5 6 7]
 [4 5 6 7 8]
 [5 6 7 8 9]]

[2. 3. 4. 5. 6. 7.]


b = np.random.randint(0,100,10)

print(rolling_window(b,5))

print(np.mean(rolling_window(b,5), axis=1))

-----
[[42 93 30 69 53]
 [93 30 69 53 93]
 [30 69 53 93 61]
 [69 53 93 61 22]
 [53 93 61 22 53]
 [93 61 22 53 71]]

[57.4 67.6 61.2 59.6 56.4 60. ]

or here is another way to do moving average :

import numpy as np

def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

print(moving_average([1,2,3,4,5,4,3,2,1]))

-------

[2.0 3.0 4.0 4.33 4.0 3.0 2.0]

search multiple values in 2D array

np.isin(data[:, 0], match)

Example:

data = np.array([[1, 4],[5, 2],[2, 4]])
match = np.array([2, 4])

np.isin(data[:,0], match)
array([False, False, True])

data[np.isin(data[:, 0], match)]
array([[2, 4]])