myriad

Given Y = 1/i calculate the next Y

I’m generating the following sequence , :
$$y_i = 1/i$$

where :

$$i = 1 \dots \infty$$
Given that I have the calculated-value at point x :

$$y_x$$

how should I calculate the next value $$y_{x+1}$$

I don’t know the index ‘x‘, I know only the calculated value.

For example, if $$y_x = 0.167$$ then $$y_{x+1} = 0.143$$ i.e. 1/6 implies 1/7

After a calculation I keep only the calculated value, not the current index.

Here is the formula :

$$ y_{x+n} = \frac {1} {{\frac 1 {y_{x}}}+n}$$


Detailed solution :

$$y_x = k; k= {\frac 1x}; x = {\frac 1k}$$

$$ y_{x+n} = \frac {1} {x+n} = \frac {1} {{\frac 1k}+n} = \frac {1} {{\frac 1 {y_{x}}}+n}$$

Slices, Ranges and __getitem__()

One annoyance when implementing __getitem__() is debugging the slice syntax interactively .. Below you can see an example.

First you create a dummy class with simple method that print its arguments.

Then you can experiment.

class Blah: 
  def __getitem__(self,*args): 
    print(args) 
    return args

b = Blah()
b[1]
b[2:5]
b[::2,2:11:2]

s = b[::3]

print()

#filling the indices
ind = s[0].indices(12)
print(f'indices: {s[0]} => {ind}')

#make it a range
rng = range(*s[0].indices(12))
print(f'range: {rng}')

for i in rng: print(i)
(1,)
(slice(2, 5, None),)
((slice(None, None, 2), slice(2, 11, 2)),)
(slice(None, None, 3),)

indecies: slice(None, None, 3) => (0, 12, 3)
range: range(0, 12, 3)
0
3
6
9

In addition inside your method you can convert a slice to a range by using the .indices() method … and then use them to do processing in a loop.

Incremental average

There are three ways to calculate an Average depending on the way we receive the data :

  1. Basic : we have all the data. In this case we just use the basic well known formula : $$Avg = \frac{1}{n} \sum_{i=0}^n x_i$$
  2. Moving average : calculated using rolling window
  3. Incremental average : the one that we will discuss now

The idea is to calculate the Basic average at every step w/o recalculating it from the whole sequence i.e. the data comes one value at a every time step.

Here is the formula :

$$a_n = a_{n-1} + \frac{x_n – a_{n-1}}{n}$$

here is how you can use it as a python closure function, so that you don’t have to carry the state :

def iavg():
  avg = 0
  n = 0
  def calc(value):
    nonlocal n,avg
    n += 1
    avg = avg + ((value - avg) / n)
    return avg
  return calc
  
avg = iavg()
print(f'2 => {avg(2)}')
print(f'4 => {avg(4)}')
print(f'6 => {avg(6)}')

-----

2 => 2.0
4 => 3.0
6 => 4.0

# (2+4+6) / 3 = 12/3 = 4

Timing code execution

Python have timeit module to test how long a piece of code takes to execute. On the other ipython %timeit magic gives much more useful information.

$ python3 -m timeit -s "5 == 55"
100000000 loops, best of 3: 0.00543 usec per loop

In [23]: %timeit 5 == 55                                                                                                                                                     
18.3 ns ± 0.393 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Luckily @Numerlor was kind enough to extract the functionality from ipython for us

In [26]: nice_timeit('5 == 55')                                                                                                                                              
18.4 ns +- 0.328 ns per loop (mean +- std. dev. of 7 runs, 10000000 loops each)
import math
import timeit


def _format_time(timespan, precision=3):
    """Formats the timespan in a human readable form"""
    units = ["s", "ms", "\xb5s", "ns"]
    scaling = [1, 1e3, 1e6, 1e9]
    if timespan > 0.0:
        order = min(-int(math.floor(math.log10(timespan)) // 3), 3)
    else:
        order = 3
    scaled_time = timespan * scaling[order]
    unit = units[order]
    return f"{scaled_time:.{precision}g} {unit}"


class TimeitResult(object):
    """
    Object returned by the timeit magic with info about the run.

    Contains the following attributes :

    loops: (int) number of loops done per measurement
    repeat: (int) number of times the measurement has been repeated
    best: (float) best execution time / number
    all_runs: (list of float) execution time of each run (in s)
    compile_time: (float) time of statement compilation (s)
    """

    def __init__(self, loops, repeat, best, worst, all_runs, compile_time, precision):
        self.loops = loops
        self.repeat = repeat
        self.best = best
        self.worst = worst
        self.all_runs = all_runs
        self.compile_time = compile_time
        self._precision = precision
        self.timings = [dt / self.loops for dt in all_runs]

    @property
    def average(self):
        return math.fsum(self.timings) / len(self.timings)

    @property
    def stdev(self):
        mean = self.average
        return (
            math.fsum([(x - mean) ** 2 for x in self.timings]) / len(self.timings)
        ) ** 0.5

    def __str__(self):
        return "{mean} {pm} {std} per loop (mean {pm} std. dev. of {runs} run{run_plural}, {loops} loop{loop_plural} each)".format(
            pm="+-",
            runs=self.repeat,
            loops=self.loops,
            loop_plural="" if self.loops == 1 else "s",
            run_plural="" if self.repeat == 1 else "s",
            mean=_format_time(self.average, self._precision),
            std=_format_time(self.stdev, self._precision),
        )


def nice_timeit(
    stmt="pass",
    setup="pass",
    number=0,
    repeat=None,
    precision=3,
    timer_func=timeit.default_timer,
    globals=None,
):
    """Time execution of a Python statement or expression."""

    if repeat is None:
        repeat = 7 if timeit.default_repeat < 7 else timeit.default_repeat

    timer = timeit.Timer(stmt, setup, timer=timer_func, globals=globals)

    # Get compile time
    compile_time_start = timer_func()
    compile(timer.src, "<timeit>", "exec")
    total_compile_time = timer_func() - compile_time_start

    # This is used to check if there is a huge difference between the
    # best and worst timings.
    # Issue: https://github.com/ipython/ipython/issues/6471
    if number == 0:
        # determine number so that 0.2 <= total time < 2.0
        for index in range(0, 10):
            number = 10 ** index
            time_number = timer.timeit(number)
            if time_number >= 0.2:
                break

    all_runs = timer.repeat(repeat, number)
    best = min(all_runs) / number
    worst = max(all_runs) / number
    timeit_result = TimeitResult(
        number, repeat, best, worst, all_runs, total_compile_time, precision
    )

    # Check best timing is greater than zero to avoid a
    # ZeroDivisionError.
    # In cases where the slowest timing is lesser than a microsecond
    # we assume that it does not really matter if the fastest
    # timing is 4 times faster than the slowest timing or not.
    if worst > 4 * best and best > 0 and worst > 1e-6:
        print(
            f"The slowest run took {worst / best:.2f} times longer than the "
            f"fastest. This could mean that an intermediate result "
            f"is being cached."
        )

    print(timeit_result)

    if total_compile_time > 0.1:
        print(f"Compiler time: {total_compile_time:.2f} s")
    return timeit_result


# nice_timeit("time.sleep(0.3)", "import time")

# IPython license
# BSD 3-Clause License
#
# - Copyright (c) 2008-Present, IPython Development Team
# - Copyright (c) 2001-2007, Fernando Perez <fernando.perez@colorado.edu>
# - Copyright (c) 2001, Janko Hauser <jhauser@zscout.de>
# - Copyright (c) 2001, Nathaniel Gray <n8gray@caltech.edu>
#
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
#   list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
#   this list of conditions and the following disclaimer in the documentation
#   and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
#   contributors may be used to endorse or promote products derived from
#   this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Cython: Tips and tricks

if you have more tricks comment below …

The last couple of days I started learning Cython …

What is Cython ?

Cython is compiled language that seamlesly integrates C with Python, plus more …

C, Cython, Python

The nice thing about is that the learning curve is very gradual. You can start w/o even changing your Python code.
Simply compiling it may speed up your script.

The next step is to start using Cython-the-language constructs.
Those include :

  • Declaring the type of the variables
  • Specifying who and how to call Functions/methods
  • Extension types : Classes which are implemented using Struct, instead of Dict allowing the dispatch resolution to happen at compile time, rather than runtime.

And finally you have the syntax to integrate directly C/C++ libraries and code.

Now on the

tips and tricks …

Creaing an array

Use 1D array instead of lists or numpy array whenever you can.
Red somewhere it is twice as fast than numpy.
In addition you can dynamically .resize() it in-place.

Here is the fastest way to create empty/zeroth array. First you need to have array templates prepared :

from cpython cimport array

cdef iARY = array.array('i') #integer
cdef IARY = array.array('I') #unsigned integer
cdef fARY = array.array('f') #float
cdef dARY = array.array('d') #double

then :

cdef ary = array.clone(fARY, size, 1)

Other options are :

cdef ary = array.array('f')
array.resize(ary, size)
array.zero(ary)

slower variant, but works on other types too :

cdef ary = array.array('f')
array.resize(ary, size)
ary[:] = 0

more on this here : Fast zero’ing

Accessing array elements

Here are several ways to access elements of array … from slower to faster.

ary[i] = value
ary._f[i] = value
#fastest, cause access the union struct directly
ary.data.as_floats[i] = value

the last one sped some portions of my code by ~30 times.

There are variations of the example above depending on the type :

  • _f, _i, _u …..
  • as_floats, as_ints, as_uints …..

A different len()

from cpython.object cimport Py_SIZE
#does not work on range()
cdef inline unsigned int clen(obj): return Py_SIZE(obj)

generates cleaner code, it should be faster. cpdef’d version is slower, which is expected.

type vs isinstance

if you have to do type checks use “type is …” instead of isinstance(), especially if you do several of them.

: x=type(5) 

: %timeit x is int
27.2 ns ± 4.12 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

: %timeit x is float
26.4 ns ± 0.731 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

: %timeit isinstance(5,int) 
52.6 ns ± 0.237 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

: %timeit isinstance(5,float) 
74.2 ns ± 1.28 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

: %timeit isinstance(x,int)                                                                                                                                           
71.5 ns ± 0.357 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

: %timeit isinstance(x,float)                                                                                                                                         
81 ns ± 1.32 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

---

: %timeit type(5) is int                                                                                                                                              
55 ns ± 0.487 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

: %timeit type(5) == int                                                                                                                                              
57.6 ns ± 1.59 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

: %timeit type(5) is float                                                                                                                                            
58 ns ± 1.26 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Python : How to check if variable is X ?

Here is how you can check the type of a variable :

In [240]: isinstance(5, int)                                                                                                                                                 
Out[240]: True

In [241]: var = 5                                                                                                                                                            

In [242]: isinstance(var, int)                                                                                                                                               
Out[242]: True

In [243]: isinstance(var, str)                                                                                                                                               
Out[243]: False

In [244]: isinstance([1,2,3], list)                                                                                                                                          
Out[244]: True

#is it one of many types
In [245]: isinstance([1,2,3], (list,tuple))                                                                                                                                  
Out[245]: True

Here is a function you can use to check if variable is an iterator :

def is_iter(x):
  try:
    iter(x); return True
  except TypeError: return False

Numpy : min/max of integer types

Here is how to find the MIN or MAX values for the Integer types in numpy :

In [235]: np.iinfo(np.int8)                                                                                                                                                  
Out[235]: iinfo(min=-128, max=127, dtype=int8)

In [236]: np.iinfo(np.int16)                                                                                                                                                 
Out[236]: iinfo(min=-32768, max=32767, dtype=int16)

In [237]: np.iinfo(np.int32)                                                                                                                                                 
Out[237]: iinfo(min=-2147483648, max=2147483647, dtype=int32)

In [238]: np.iinfo(np.int64)                                                                                                                                                 
Out[238]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

In [239]: np.iinfo(np.int)                                                                                                                                                   
Out[239]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

Perl: __DATA__ section

This is my first Perl post. I wanted to explore a less known feature of Perl which is very useful at times.

Lets say as in my case that you need to combine and then compress .js and .css files from all over the place.
What I mean is that instead of importing in your html multiple different files you want to just import one compressed javascript file and one compressed css file.

Your first instinct may be, is to create a hash and describe all files locations … but may be the simple and dirty approach is better, enter the Perl __DATA__ section.

It is very clever idea, what if you can treat part of the source code file as a different file i.e. file within a file. Simply said everything after the __DATA__ token could be accessed via the <DATA> file handle.

Saying : @array = <DATA> will slurp everything into the @array, one line per element. In the example below i’m also removing the new line symbol.

So how do we setup which files to compress ? Simple, first type __DATA__ at the end of the file and then just use any command to append the path/file spec. I used those :

  • ls -w1 -d $PWD/blah/*.js >> compress.pl
  • ls -w1 -d $PWD/blah/*.css >> compress.pl

… several times. Don’t forget use append >>, not overwrite >

What does the script do :

  • read the __DATA__ section into the @files array
  • filter .js and .css and create a space separated string
  • using the strings from above build commands to concatenate the two groups
  • run the tools to compress concatenated files
#!/usr/bin/env perl
use strict;

sub run {
	my $cmd = shift @_;
	print "> $cmd\n";
	qx/$cmd/
}

#concatenated files
my $cat_js = 'cat.js';
my $cat_css = 'cat.css';
my $out = 'out';
#define compress commands
my $js_cmd = "/usr/local/bin/uglifyjs -c -m -- $cat_js > $out.js";
my $css_cmd = "yui-compressor --type css $cat_css > $out.css";

#read the DATA section
my @files = map {s/\n//;$_} <DATA>;
#extract js and css file
my $js_files  = join ' ', (grep /js$/, @files);
my $css_files = join ' ', (grep /css$/, @files);

#concatenate files
my $js_cat  = "cat $js_files > $cat_js";
my $css_cat = "cat $css_files > $cat_css";

run($js_cat);
run($js_cmd);

run($css_cat);
run($css_cmd);

#ls -w1 -d $PWD/*.*  >> compress.pl

__DATA__

One more thing, what is the purpose of the first line …

Shebang

If you make the file executable and have :

#!/usr/bin/perl

or

#!/usr/bin/env perl

as a first line then instead of calling the script like this :

perl compress.pl

you can do it this way :

./compress.pl

Prolog: Saving the fact+rules to file

Sometimes you may wish you can save your current state to file. Here is how you do it

save(Heads,File) :- tell(File), listing(Heads), told.

you can use the same idea to save the result of other predicates to file.

Enums

Python does not have builtin Enumerable type, but we can easily resolve this problem :

def enum(args, start=0):
	class Enum(object):
		__slots__ = args.split()
		names = {}

		def __init__(self):
			for i, key in enumerate(Enum.__slots__, start):
				self.names[i] = key
				setattr(self, key, i)

	return Enum()


e = enum('ONE TWO THREE', start=1)

print(e.ONE)
print(e.TWO)
print(e.THREE)

-----

1
2
3