programming

Numpy : min/max of integer types

Here is how to find the MIN or MAX values for the Integer types in numpy :

In [235]: np.iinfo(np.int8)                                                                                                                                                  
Out[235]: iinfo(min=-128, max=127, dtype=int8)

In [236]: np.iinfo(np.int16)                                                                                                                                                 
Out[236]: iinfo(min=-32768, max=32767, dtype=int16)

In [237]: np.iinfo(np.int32)                                                                                                                                                 
Out[237]: iinfo(min=-2147483648, max=2147483647, dtype=int32)

In [238]: np.iinfo(np.int64)                                                                                                                                                 
Out[238]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

In [239]: np.iinfo(np.int)                                                                                                                                                   
Out[239]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

Perl: __DATA__ section

This is my first Perl post. I wanted to explore a less known feature of Perl which is very useful at times.

Lets say as in my case that you need to combine and then compress .js and .css files from all over the place.
What I mean is that instead of importing in your html multiple different files you want to just import one compressed javascript file and one compressed css file.

Your first instinct may be, is to create a hash and describe all files locations … but may be the simple and dirty approach is better, enter the Perl __DATA__ section.

It is very clever idea, what if you can treat part of the source code file as a different file i.e. file within a file. Simply said everything after the __DATA__ token could be accessed via the <DATA> file handle.

Saying : @array = <DATA> will slurp everything into the @array, one line per element. In the example below i’m also removing the new line symbol.

So how do we setup which files to compress ? Simple, first type __DATA__ at the end of the file and then just use any command to append the path/file spec. I used those :

  • ls -w1 -d $PWD/blah/*.js >> compress.pl
  • ls -w1 -d $PWD/blah/*.css >> compress.pl

… several times. Don’t forget use append >>, not overwrite >

What does the script do :

  • read the __DATA__ section into the @files array
  • filter .js and .css and create a space separated string
  • using the strings from above build commands to concatenate the two groups
  • run the tools to compress concatenated files
#!/usr/bin/env perl
use strict;

sub run {
	my $cmd = shift @_;
	print "> $cmd\n";
	qx/$cmd/
}

#concatenated files
my $cat_js = 'cat.js';
my $cat_css = 'cat.css';
my $out = 'out';
#define compress commands
my $js_cmd = "/usr/local/bin/uglifyjs -c -m -- $cat_js > $out.js";
my $css_cmd = "yui-compressor --type css $cat_css > $out.css";

#read the DATA section
my @files = map {s/\n//;$_} <DATA>;
#extract js and css file
my $js_files  = join ' ', (grep /js$/, @files);
my $css_files = join ' ', (grep /css$/, @files);

#concatenate files
my $js_cat  = "cat $js_files > $cat_js";
my $css_cat = "cat $css_files > $cat_css";

run($js_cat);
run($js_cmd);

run($css_cat);
run($css_cmd);

#ls -w1 -d $PWD/*.*  >> compress.pl

__DATA__

One more thing, what is the purpose of the first line …

Shebang

If you make the file executable and have :

#!/usr/bin/perl

or

#!/usr/bin/env perl

as a first line then instead of calling the script like this :

perl compress.pl

you can do it this way :

./compress.pl

Prolog: Saving the fact+rules to file

Sometimes you may wish you can save your current state to file. Here is how you do it

save(Heads,File) :- tell(File), listing(Heads), told.

you can use the same idea to save the result of other predicates to file.

Enums

Python does not have builtin Enumerable type, but we can easily resolve this problem :

def enum(args, start=0):
	class Enum(object):
		__slots__ = args.split()
		names = {}

		def __init__(self):
			for i, key in enumerate(Enum.__slots__, start):
				self.names[i] = key
				setattr(self, key, i)

	return Enum()


e = enum('ONE TWO THREE', start=1)

print(e.ONE)
print(e.TWO)
print(e.THREE)

-----

1
2
3

Stats functions

Quick list of statistics Functions :

import numpy as np

class stats():

	"""
		Calculate the deviations in a sample.
	"""
	@staticmethod
	def dev(xs): return xs - np.mean(xs)

	"""
		Calculate the covariance between two data sets.
			sample=True : sample covarince
			sample=False: population covariance
	"""
	@staticmethod
	def cov(xs,ys,sample=True):
		dec = 1 if sample else 0 #if sample-cov decrement len by 1
		#sum of products of deviations
		return np.dot( stats.dev(xs), stats.dev(ys) ) / (len(xs) - dec)

	@staticmethod
	def var(xs,sample=True):
		dec = 1 if sample else 0 #if sample-cov decrement len by 1
		return np.sum(stats.dev(xs)**2) / (len(xs) - dec)

	@staticmethod
	def std_dev(xs,sample=True):
		return np.sqrt( stats.var(xs, sample) )

	"""
		Calculate Pearson correlation.
	"""
	@staticmethod
	def corr(xs,ys,sample=True) :
		varx = stats.var(xs)
		vary = stats.var(ys)

		corr = stats.cov(xs,ys,sample)/ np.sqrt(varx * vary)
		return corr

	@staticmethod
	def rank(xs): return np.argsort(np.argsort(xs))
	@staticmethod
	def spearman_corr(xs,ys,sample=True):
		xranks = stats.rank(xs)
		yranks = stats.rank(ys)
		return stats.corr(xranks,yranks,sample)

	@staticmethod
	def r2(ys,residuals): #coef of determination
		return ( 1 - stats.var(residuals) ) / stats.var(ys)

	"""
		Calculate auto correlation coefficients for a time series or list of values.
			lag : specifies up to how many lagging coef will be calculated.
				!! The lag should be at most the lenght of the data minus 2, we skip lag-zero.
	"""
	@staticmethod
	def auto_corr(xs,lag=1,sample=True):
		if lag > len(xs) - 2 : raise Exception("Lag(%s) is bigger than the data-len(%s) - 2" % (lag,len(xs)))
		ac = np.zeros(lag)
		for i in range(1, lag+1) :
			ac[i-1] = stats.corr(xs[:-i], xs[i:], sample)
		return ac

JS: range()

JavaScript does not have a range() operator, but can be implemented using generators :

function* range2(start, end, step) {
	step = step || 1
	yield start;
	if (step > 0 && start >= end) return;
	if (step < 0 && start <= end) return;	
	yield* range(start + step, end, step);
}

function* range(start, end, step) {
	step = step || 1
	if (step > 0) { for (let i = start; i < end; i+= step) yield i; }
	else { for (let i = start; i > end; i+= step) yield i;}
}


console.log([...range2(0,6)])
console.log([...range2(0,10,3)])
console.log([...range2(10,0,-3)])
console.log([...range(0,11,2)])
console.log([...range(10,-1,-2)])

-----

[ 0, 1, 2, 3, 4, 5 ]
[ 0, 3, 6, 9 ]
[ 10, 7, 4, 1 ]
[ 0, 2, 4, 6, 8, 10 ]
[ 10, 8, 6, 4, 2, 0 ]

Combinatorics: (n choose r)

Here is a quick function to calculate (n choose r) i.e. in how many ways you can combine “r” items out of total of “n” items.

def nCr(n,r):
	print(f"n,r: {n}, {r}")
	assert n >= r, "nCr : n < r !"
	r = min(r,n-r)
	if r == 0 : return 1
	numerator = reduce(op.mul, range(n,n-r,-1))
	denominator = reduce(op.mul, range(1, r+1) )
	return numerator//denominator

and here is in JavaScript :

N :
R :
(N R) :

This uses several interesting tricks which I should probably explain in a separate post : generators, range, splat operator.

function* range(start, end, step) {
	step = step || 1
	if (step > 0) { for (let i = start; i < end; i+= step) yield i; }
	else { for (let i = start; i > end; i+= step) yield i;}
}

Math.mul = (a,b) => a * b
  
function nCr(n,r) {
        if (r > n) return 'r > n ??'
	r = Math.min(r,n-r)
	if (r == 0) return 1
	numerator = [...range(n,n-r,-1)].reduce(Math.mul)
	denominator = [...range(1, r+1)].reduce(Math.mul)
	return Math.floor(numerator/denominator)
}

JS: Copy array vs Reference

Normally when you pass Arrays around in JavaScript and in every other language they are passed by reference. That is a sane default, but sometimes you want to use or make a copy, so that original is not modified when we do changes.

Here is how by-reference work :

a = [1,2,3,4,5]

b = a
b[2] = 55

console.log(a)

------

[ 1, 2, 55, 4, 5 ]

… changing b changes a too.

But if we copy the array a stays the same.

Below I present two ways to copy Arrays in JavaScript via slicing or using the splat/triple-dot-operator.

a = [1,2,3,4,5,[6,[7],8],9]

b = a.slice()
b[2] = 55

console.log('slice-copy : ')
console.log(a)

a = [1,2,3,4,5,[6,[7],8],9]

b = new Array(...a)
b[2] = 55

console.log('... copy : ')
console.log(a)

-----

slice-copy : 
[ 1, 2, 3, 4, 5, [ 6, [ 7 ], 8 ], 9 ]
... copy : 
[ 1, 2, 3, 4, 5, [ 6, [ 7 ], 8 ], 9 ]

This also works for objects/hashes

a = { 'key1' : 'val1', 'key2' : 'val2' }

b = a //reference

b['key1'] = 'val55'

console.log(a)

a = { 'key1' : 'val1', 'key2' : 'val2' }

b = { ... a } //copy

b['key1'] = 'val55'

console.log(a)

-----

{ key1: 'val55', key2: 'val2' }
{ key1: 'val1', key2: 'val2' }

Distance measures

Examples of implementations of different distance measures. When I have some time I will elaborate on their usage.

#Simple matching coeficient
def smc(sdr1, sdr2):
	n11 = (sdr1 & sdr2).count()
	n00 = (~sdr1 & ~sdr2).count()
	return float(n11+n00)/len(sdr1)

def jiccard(sdr1,sdr2):
	n11 = (sdr1 & sdr2).count()
	n01_10 = (sdr1 ^ sdr2).count()
	return n11/float(n01_10+n11)


def cosine_similarity(vector,matrix):
	sim = ( np.sum(vector*matrix,axis=1) / ( np.sqrt(np.sum(matrix**2,axis=1)) * np.sqrt(np.sum(vector**2)) ) )[::-1]
	return sim

def euclidean_distance(vector, matrix):
	dist = np.sqrt(np.sum((vector - matrix)**2,axis=1))
	#dist = np.linalg.norm(vector - matrix, axis=1)
	return dist