Climate Change: Part 1… Rainfall:– Is it declining??

I recently did some of Coursera’s data science modules as way to sharpen my skills in big data. The modules use R to process the data and do the statistical analysis. My goal is to use this knowledge, I will try to analyze Tobago’s precipitation and temperature for the period 1980 to 2015. The data can be found at the Trinidad & Tobago Meteorological Service. This analysis will be in two parts: Rainfall trend and Temperature trends.

For the past few (10?) years Tobago have been experiencing rather hash drought like conditions. Each year, the Water and Sewerage Authority (WASA) has implemented its Water Supply and Conservation plans in order to ensure a supply. Some of these included water rationing, a ban on watering plants or washing cars with a hose.

As a farmer, I felt the effects of the heat and lack of rain fall during the wet/rainy season. I had to use pipe borne (via a tank) water almost everyday.  Fortunately, I was able to harvest my crops though I believe the yield would have been higher if the rain fall was “normal”.

The main reason WASA has given for the short fall in water is that rain fall has been steadily declining over the years.  While the public believe WASA did not plan for increase usage by the population. So which is true?  No planning for increased usage  or decrease rainfall over time. Hey, maybe both!.  WASA has not given any “numbers” to back up their assertion. While there has been a marked increase in usage due to a higher number of visitors to the island especially during July/Aug months.

So is WASA’s reason for the short fall true?

Read more ›

How does your computer resolve cnn.com in the browser

I was recently asked this question as a follow-up to the interview question “what happens when you type cnn.com in the browser?”. The answer to the main question is on a high level, i.e. starting with checking the browser’s cache for the domain name. If it does exist then it uses that, if it’s not found in the cache to checks the router’s routing table… etc.

So when I was asked to drill down a bit further; how is the domain name resolved on the local computer? I responded that it checks the browser’s cache. Wrong!! It turns out that domain lookups are stored in a DNS cache which is a temporary database maintained by the operating system. So this is the first place it looks in order to resolve the domain name.

For more information click here

Bash basics example

List the number of lines in all of the csv data files.
wc -l *.csv

Next, remove the line containing the word “total”.
wc -l *.csv | grep -v total

Display the line with fewest number of lines
wc -l *.csv | grep -v total | sort -n | head -n1

Store results in a file
wc -l *.csv | grep -v total | sort -n | head -n1 > results.csv

Bash basics 2

Summary:
head and tail select rows,
cut selects columns, and
grep “phrase” selects lines that contain “phrase“.
Note: grep -v: invert the match, i.e., only show lines that don’t match

** Writing to an output file
command > output_file

** To combine commands, use the pipe (|) — The pipe symbol tells the shell to use the output of the command on the left as the input to the command on the right.
head -n5 myfile.csv | tail -n3
cut -d, -f2 filename.csv | grep -v col_name — this passes the 2nd column of the file to grep which inverts (shows only the lines that do not match) data.  so the head row is removed

** Counting to the number of characters, words of lines
wc (word count)
options:
wc -c (count characters)
wc -w (count words)
wc -l (count lines)

Other wild cards used for matching
? matches a single character, so 201?.txt will match 2017.txt or 2018.txt, but not 2017-01.txt.
[…] matches any one of the characters inside the square brackets, so 201[78].txt matches 2017.txt or 2018.txt, but not 2016.txt.
{…} matches any of the comma-separated patterns inside the curly brackets, so {*.txt, *.csv} matches any file whose name ends with .txt or .csv, but not files whose names end with .pdf.

Sorting
Puts data in (By default ascending) order.
sort -n (sort numerically)
sort -r (reverses the order)
sort -b (ignore leading blanks)
sort -f (fold case –i.e., be case-insensitive–).

Removing duplicates
unique — removes adjacent duplicated lines
Note: uniq -c — displays unique lines with a count of how often each occurs rather than using uniq and wc
e.g. cut -d, -f2 filename |grep -v Tooth | sort | uniq -c

Bash basics

Environmental variables
— are in all caps. E.g HOME, USER
— the set command displays all the environmental variable
Getting the value of an environmental variable:
1. echo $HOME — note the $ as the first character
2. set | grep HISTFILESIZ

Shell (local) variables:
— simply assign a value to a name. E.g. name=Chris Note NO SPACES around the equal sign!!
— To display variable content: echo $name
data_file= data/somefile.csv
head -n 1 $data_file — displays the first line in the file

Bash allows the use of the following for loops:
— for and while
For loop format:
for line in file
do
echo $line
done

The above cam be written: for name in Tom Harry Jane; do echo $name; done — NOTE: the semi-colon after each statement. The do/done are KEY words
Wild cards can be used. E.g. for filename in data/*.csv; do echo $filename; done

File manipulation
— View a file content
— head data/filename.csv — displays the first 10 (default) lines
head -n 1 filename.csv — displays the first line

— tail data/fielname.csv — displays the last 10 (default) lines
— tail -n 1 data/filename.csv — displays the last line

— head -n15 filename.csv | tail -n 5 — passes the first 15 lines of the file to the tail command which in turn displays lines 11 to 15

— cat filename — displays the entire contents of the file to the screen
— more/less filename — allows you to page thru the contents of the file

— cut d, -f 2-5, 8 filename.csv — displays selected columns from a file:
— d, = delimiter comma (space is default)
— f 2-5,8 -d = fields (columns) to display — columsn: 2 thru 5, and 8
— doesn’t understand quoted strings

Python Variable

Variable Assignment:

n = 300 — python creates an object of type integer in memory and assigns n as a pointer to it’s location.
m = n — python DOES NOT create a not integer object, but assigns m as a pointer to the same location.
n = “hello” — n no longer points to the location of 300 but now points to the location of the string object; m — continues to point to the location of 300
m = 40.3 — m now points to the location of the float type object; 300 is orphaned and there is no access to it.

Note: the id() method can show you the memory location of the object

Something interesting:
In [55]: n=300

In [56]: m=300

In [57]: id(n)
Out[57]: 4395607760

In [58]: id(m)
Out[58]: 4397257296
***variables point to object in different object locations


In [51]: n=30

In [52]: m=30

In [53]: id(n)
Out[53]: 4304949184

In [54]: id(m)
Out[54]: 4304949184
**Both variables point to the SAME object location

Here, m and n are separately assigned to integer objects having value 30. But in this case, id(m) and id(n) are identical!

For purposes of optimization, the interpreter creates objects for the integers in the range [-5, 256] at startup, and then reuses them during program execution. Thus, when you assign separate variables to an integer value in this range, they will actually reference the same object.

Python Files

Use the open() method to access the contents of a file.
example:
FH = open(“filename”, “mode”) –> returns a file object that can be used to manipulate the file’s content.
More efficient way:
with open(“filename”, “mode”) as FH: –> returns a file object
FH.read()

NOTE:
Can loop over the file object:
for line in FH:
print(line)

mode:
r (default) = read from a file
w = write to file (will over-write previous content or create file if it does not exist)
a = append to a file

File object methods:
FH.read() –> puts the entire content of the file in a string
FH.readline() –> returns a single line
FH.readlines() (or list(FH)) –> returns a list/array of all the content of the file
FH.write() –> write content to a file


Libraries:
CSV and JSON

CSV Methods:
reader() –> returns a reader object. Each row read from the csv file is returned as a list of strings. No automatic data type conversion is performed.
DictReader() –> returns a writer object responsible for converting the data into delimited strings
writer() –>

CSV Example:
import csv

count = 0

with open(“filename”) as FH:
datareader = csv.reader(FH, delimiter=’,’) # returns an object
for row in datareader: # each row is a list
if count == 0: # header row
count += 1
elif: count > 0 and count < 5: print(row[0]) #each item in the list is a string

Python files (CSV data)


Libraries:
CSV and JSON

CSV Methods:
reader() –> returns a reader object. Each row read from the csv file is returned as a list of strings. No automatic data type conversion is performed.
DictReader() –> returns a writer object responsible for converting the data into delimited strings
writer() –>

CSV Example:

Sample of the data:
datetime,host,src,proto,type,spt,dpt
2013-03-03 21:53:59,groucho-oregon,1032051418,TCP,,6000,1433

import csv

count = 0

with open(“filename”) as FH:
   datareader = csv.reader(FH, delimiter=’,’) # returns an object
      for row in datareader: # each row is a list
         if count == 0: # header row
            count += 1
         elif count > 0 and count < 5:
            data_time = row[0]
            print(data_time.split(‘,’))
            count += 1
         else:
            break

   

Kubernetes and Docker

This post is where I plan to document my learning and experiences deploying both Kubernetes and Docker for managing Jupyter notebooks in an educational setting.

Kubernettes: is defined as “a container orchestration and management tool for automating the deployment and monitoring of containerized applications. In my case Docker containers.

The main (there are others) parts of Kubernetes are :
Master node
— manages worker nodes in a cluster (known as pods) and deployment of clusters (pods)
— coordinates all activities in your cluster, such as scheduling applications, maintaining applications’ desired state, scaling applications, and rolling out new updates.

Worker node:
— servers which run the application containers (in Pods) and other Kubernetes components such as proxies
— has a Kubelet, which is an agent for managing the node and communicating with the Kubernetes master. The
node should also have tools for handling container operations, such as Docker or rkt.

Service: functions as a proxy to replicated pods. Service requests can be load balanced across pods.

Pod:
— is the basic object of deployment. Each pod has its own IP address and can contain one or more container.
— contain the definition of how the containers should be configured and run. Kubernetes uses these
definitions to maintain the necessary resources. For example, you can define you need two pods.
During execution, if a pod goes down, Kubernetes will automatically fire up a new pod.

Link to article about Kubernetes Architecture below
An Introduction to Kubernetes

Kubernetes Networking:

— Every pod has its own IP address.
— Takes care of routing all internal requests between hosts and pods.
— External access is provided through a service, load balancer, etc.

Note: A Kubernetes cluster that handles production traffic should have a minimum of three nodes.

In order to deploy applications on Kubernetes, you tell the master to start the application containers. The master schedules the containers to run on the cluster’s nodes. The nodes communicate with the master using the Kubernetes API, which the master exposes. End users can also use the Kubernetes API directly to interact with the cluster

Database Notes

MySQL has been my database of choice for the web app projects I worked on. My fellowship experience has made me realize that my level of knowledge was limited due to the size of my data and the complexity of the queries I was using. With that, I have created this page in order to keep track of the new things I learn doing the various SQL exercises.

JOINS:
A JOIN clause is used to combine rows from two or more tables, based on a related column between them.

INNER: Returns records that have matching values in both tables

LEFT OUTER: Return all records from the left table, and the matched records from the right table (intersect)

RIGHT OUTER: Return all records from the right table, and the matched records from the left table (compliment of the intersection)

FULL OUTER (not allowed in MySQL, use LEFT & RIGHT with UNION)

http://cloudtobago.com/images/innerjoin.gif
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN

 

Table: Person

Column Name
PersonId
FirstName
LastName
Type
int
varchar
varchar

PersonId is the primary key column for this table.


Table: Address

Column Name
AddressId
PersonId
City
State
Type
int
int
varchar
varchar

AddressId is the primary key column for this table.

Example question
Write a SQL query for a report that provides the following information for each person in the Person table, regardless if there is an address for each of those people:

Output: FirstName, LastName, City, State

Note, We want to produce a report of PEOPLE with or without addresses. To accomplish this, we can use the LEFT JOIN. LEFT JOIN creates a table with ALL the data from the left table (A) and merges the information from the right table (B) that matches.

QUERY STATEMENT
SELECT FirstName, LastName, City, State
FROM Person LEFT JOIN Address ON Person.PersionId = Address.PersonId

I’m excited!

I recently did a DevOpps fellowship at Insight. It was an intense and exhilarating experience. My reason for accepting the fellowship was so that I could learn about network infrastructure in the cloud. The world of data collection has drastically since my days as a system admin at a market research company. After suffering burnout I moved into doing technical support, training and web app programming. During all this time, I have never lost my love for infrastructure.

The fellowship gave me the opportunity to explore different areas of network infrastructure automation. Some of the areas included Continuous Integration/Deployment, Monitoring, Infrastructure as code. The DevOps fellowship took place at the same time as the Data Engineering fellowships so I was able to understand their role in the ecosystem of collecting and managing of very large datasets.

Part of the fellowship included a project that implements an area of DevOPs I am interested in. My project included monitoring of the network (Click here)

I am looking forward to gaining more Devops experience so I can become an expert.

Top