Fake News Detection

Introduction

The Fake News Challenge (FNC) is a competition to explore how machine learning can contribute to the detection of fake news. The first stage of the challenge is to accomplish something called stance detection. Given a short headline and an article, we need to categorize the relationship between the article and headline into 4 categories: Disagree, Agree, Unrelated, and Discusses.

Read More

Creating an SSH Key for Github

SSH is a nice program that lets you update your repository from Github without asking you for the password every single time. However, SSH has 2 different protocols that work differently.

Read More

Dual Boot Ubuntu and Windows

Dual booting two different OS’s means that you have the choice to switch between the two OS’s at startup. This can be useful if you manage to break one of them, as the other can be used to recover. Many people dual boot Windows and a Linux distribution.

Read More

Dijkstra's Algorithm

Algorithm description

Dijkstra’s Algorithm, invented by famous computer scientist E.W. Dijkstra, is an algorithm used to find the shortest path between 2 points on a weighted graph. A weighted graph is a graph in which the each edge is associated with a numerical value. Depending on the context, that value can represent the cost of traversing the edge, the length of the edge, etc. Dijkstra’s Algorithm only works when edge lengths are nonnegative.

Read More

USACO Gold February 2014

In the problem Roadblock from USACO Gold February 2014, we are asked to sabatoge Farmer John’s commute to his barn by rolling a single haybale onto a path, doubling the time it takes to traverse that path.

Read More

Hash Table

Introduction

Hash Table is a data structure that is extremely efficient at storing and retrieving values.
A hash table achieves O(1) time for retrieval of a value, in comaprison to O(N) for unsorted array.

Read More

Eularian Paths

Intro

An Eulerian Path, named after mathematician Leonhard Euler, is a traveral of a graph that visits each edge exactly once.
The famous 7 Bridges of Konigsberg problem, first solved by Euler, gave rise to the idea of an Eulerian Path, and was the foundation of modern graph theory.
The city of Konigsberg contained 7 bridges. Citizens would attempt to walk around the city while traversing the bridges only once.
Below is a simplified version of Koningsberg

Read More

Directed Acyclic Graphs

A Directed Acyclic Graph (DAG) is a directed graph with no directed cycles.
This means that it is not possible to start from a vertex and come back to it by traversing the edges.
Remember that in a directed graph, edges can only be traversed in the direction of the arrow.
Graph 1 shows a DAG.

Read More

Graph Introduction

A graph is a collection of vertices (nodes) that are connected with a set of edges. The graph below has 4 nodes and 4 edges.

Read More

DFS/BFS

We will now explore 2 fundamental algorithms for processing graphs: Depth-First Search (DFS) and Breadth-First Search (BFS). These algorithms are used in a wide variety of problems.

Read More

Binary Search

Binary Search is an efficient method of searching in an ordered list.
If we have an unordered list and are told to search for a particular element, we have no choice but to search the entire list.
This is because when we examine an element in the list, we don’t get any information about the rest of the elements in the list.

However, when we examine an element in an ordered list, we do get information about other elements.
{-5,-2,0,2,4,6,8,9,10}
Let’s suppose we’re searching for 9. We begin by examining the 4th element (length of the list/2 = 4). There are three cases that could arise.

  1. The 4th element is equal to 9.
  2. The 4th element is less than 9.
  3. The 4th element is greater than 9.

It’s obvious that 2 (the 4th element of the list) is not 9. However, we can get some more information from our observation.
Since the list is in ascending order, we know that elements 0 - 3 are less than 2. This means that they are also less than 9, so we don’t need to investigate these elements.
In addition, we also know that 9 has to be in the upper half of the list, namely, elements 5 - 9.

Now, we are effectively searching along this list:
{6,8,9,10}
We compute the halfway point of this list ((5+9)/2=7).

The 7th element is 8, so our list is once again reduced to the following:
{9,10}.
Computing the halfway point again (element 8), we see that the 8th element is indeed 9, so we are done.

Binary Search is known as Decrease and Conquer algorithm because it reduces the size of the problem that we are solving.
Notice how at every iteration, we cut the size of the array we are searching on by half. This behavior gives Binary Search an excellent runtime of O(logn).

We’ll implement Binary Search recursively for simplicity:

//This algorithm returns the index of the target if found. Otherwise, returns -1. 
public int binSearch(int low, int high, int[] array, int target){ 
  if(high >= low){                           //low represents the left endpoint of the interval we are searching on
      int halfway = (low+high)/2;            //high represents the right endpoint of the interval we are searching on
      if(array[halfway] == target) {
        return halfway;                            //Case 1: We've found the target, so return
      }
      else if(array[halfway] < target) {                      //Case 2: The halfway element is too low.
        return binSearch(halfway + 1, high, array, target);   //So we search the upper half of the curreint interval
      }
       else {                                                  //Case 3: The halfway element is too high
        return binSearch(low, halfway - 1, array, target);     //So we search the lower half of the current interval
       }
  }
  return -1;                                  //At this point, we know that the target is not in the array.
                                              //We've narrowed down the possible location of the target to be a single number.
                                              //And that location wasn't the target.
}

We can actually improve this algorithm by a small amount. Note that integer overflow could occurr by doing (low+high)/2, even though the end result is actually lower than Integer.MAX_VALUE. We can change this to low + (high-low)/2 to avoid this problem.

Read More