Computer Science Intern Revision Primer

In the coming few days as of writing this, I’ll be going through what will hopefully be two successful phone interviews for a *insert large company name* intern position this summer. Below are some areas of CS which I think may be likely to come up. Please keep in mind this was written in three days, so while I hope there may be few mistakes, some may have slipped by unnoticed.

Coding:

It’s definitely worth while knowing basic CS code. i.e: Fibonacci in Java, Sum of Fibonacci numbers, reverse a string, check for palindromes, etc… JavaPuzzlers and Project Euler (Project Euler seems more maths based) are two sites which have plenty of sample questions to go over.

Binary Search (Search Algorithm)

public class search {

  public static void main(String[] args) {

    int c, first, last, middle, size, search, list[];

    search = 20;
    size = 12;
    list = new int[]{1,3,5,7,9,11,15,16,17,19,20,25}; // in binary search, data must be ordered.

    first = 0;
    last = size;
    middle = (first + last) / 2;

    while( first <= last){

      if (list[middle] < search){ 

        // If search value is greater than mid-point value, we know the value is not in the first half. Move 'first' boundary to 'middle'
        first = middle - 1;

      } else if (list[middle] == search){

        System.out.println("Found item at position " + middle + ", value: " + list[middle]);
        break;

      } else {

        // If search value is greater than mid-point value, we know the value is not in the last half. Move 'last' boundary to 'middle'.

        last = middle + 1;
      }

      middle = (first + last) / 2;
      System.out.println("First: " + first + " Middle: " +middle+ " Last:"  + last);
    }
  }

}

Binary search is a pretty fundamental algorithm. Know it! Its run-times are:

Best: O(n log(n))
Average: O(n log(n))
Worst: O(1)

 

Merge Sort (Sorting Algorithm + use of recursion!)

public class mergesort {
	private static int[] numbers;
	private static int[] helper;
	private static int number;

	public static void sort(int[] values){

		number = values.length;
		numbers = new int[number];
		helper = new int[number];
		numbers = values;
		mergesort(0, number - 1);

		for (int i = 0; i < number; i++){

			System.out.print(numbers[i] + ", ");

		}

	}

	public static void mergesort(int low, int high){

		if (low < high){
			int middle = low + (high-low) / 2;
			mergesort(low, middle);
			mergesort(middle + 1, high);
			merge(low, middle, high);
		}

	}

	public static void merge(int low, int middle, int high){

		for (int i = low; i <= high; i++){
			helper[i] = numbers[i];
		}

		int i = low;
		int j = middle + 1;
		int k = low;

		while(i <= middle && j <= high){
			if (helper[i] <= helper[j]){
				numbers[k] = helper[i];
				i++;

			} else {

				numbers[k] = helper[j];
				j++;

			}
			k++;
		}

		while(i <= middle){
			numbers[k] = helper[i];
			k++;
			i++;
		}

	}

	public static void main(String[] args){

		int[] toSort = new int[]{9,4,6,3,2,1};
		sort(toSort);

	}
}

Merge Sort is a pretty standard algorithm to know too. It’s a divide and conquer algorithm which has the following time complexities:

Best: O(n log(n))
Average: O(n log(n))
Worst: O(n log(n))

Interviewers may ask in which cases is Insertion-Sort is quicker than Quick-Sort. Insertion sort has a best case of O(n) and a worst case of O(n2) and as such, insertion sort is ideal when the list is almost ordered or the list is small. The overhead is also smaller then some algorithms which use recursion which may make it good for memory constraint systems, however in most cases memory is not an issue.

 

Construct/traverse data structures

Implement system routines

Distil data sets to single values

Transform one data set to another

Algorithms:

Big-O analysis

Sorting and Hashing
For sorting algorithms code, please see the Code section above. For theory, continue reading. In general, you should know one O(n log(n)) algorithm but preferable two, as well as knowing their differences. Here I will go over Merge-Sort and Quick-Sort for reasons I will soon explain.

Merge-Sort (Source) (Wikipedia Image: Source) (Video: Source)
Merge-Sort is a divide and conquer algorithm. You start by taking a long list of unsorted data, and you break it down into the smallest list possible, of size one. From there you start merging again. If we look at the image below taken from Wikipedia, on the fourth row where we see 38 and 27, the two integers are compared and in this case swapped to create a list of size 2 with the numbers 27 and 38 respectively. The process is continued and when there is one list left, the original list has been sorted.

Merge_sort_algorithm_diagram.svg

 

As listed in the ‘Coding’ section, the time complexity is:
Best: O(n log(n))
Average: O(n log(n))
Worst: O(n log(n))

Quick-Sort (Video: Source)

 

Data structures:

Study structures: famous NP-complete problems such as travelling sales & man knapsack problem)

Trees, basic tree construction, traversal and manipulation, hash table, stacks, arrays, linked lists, priority queues.

 

Hash Tables and Maps:

Hash Tables – (Source) (Source: Lecture slides) (Video: Source)

What are hash tables + learn to implement one using only arrays.

A hash table is a data structure used to implement an associative array; a structure which can map keys to values. The hash table uses a hashing function to compute an index into an array from which the correct value can be found. To retrieve the value, you simply pass in the value to the hash function, which returns an index. You can then use the index to return the data from the array.

The perfect hash function is injective. Simply put, if you pass in a value, you should have one unique hash value or index. However this is not always possible, and in such cases a collision occurs. There are two ways to resolve a collision: linear probing and separate chaining. As I have been taught about chaining, I’ll explain that in further detail, however there’s a great video by CS50 on youtube about hash tables (see source for link).

A simple and easy hash function to remember is: h(k) := k mod m

Chaining

When a collision occurs (and the video explains this nicely too), you can solve the problem using linked lists. If you have two values on index 1 in an array, a collision occurs. From there, you can create a doubly-linked list to the second object, resulting in something which looks similar to the image below. You will see that k1 and k4 when passed through the hashing function return index 2 in the array T. As a result, k1 is the head of the doubly-linked list, and k4 is the tail.

hash-table-linked-list

 

Hash Tables have the following time complexities:

Best: O(1)
Worst: O(1)
Average: O(1)

A detailed explanation of why it is O(1) can be found here. Essentially, as we have arrays which allow us to quickly check the index versus trees where we must first traverse the tree, we are able to achieve O(1) as by entering the key we immediately have the value.

Basic tree construction:

Breadth First Search (BFS) & Depth First Search (DFS) (Source)
I believe that I have reasonable understanding of these tree traversal algorithms, however I do not want to spread false false information. To better understand DSF and BFS, this video and link explain it nicely.

n-ary tree / b-tree / b+tree (Source)

n-ary
n-ary trees have up to n children

b-trees and b+trees (source)
b-trees and b+ trees are a balanced variant of the n-ary tree.

B-tree

B+tree
A b+ tree is more commonly used in every day applications, most notably mySQL (more info). Below we can see the max degree is three. This means that as a third element is entered into the b+ tree, a child node is created. If the max degree were 2, if we had a root node which contained 001, 002 we would only have one node. However, upon the insertion of 003, the root node would become 002 with the left leaf being 001 (a leaf being a node with no children) pointing to to 002 being in the same leaf as 003 (similar to how 0003 points to [0004, 0005] in the image below).

bPLUStree

It is important to note that b+ trees stores records in the leaf nodes. The intermediate numbers, that is to say the numbers between the leaf node and the root node are simply place-holders, to help the search algorithm find the correct data in the shortest amount of time. It is worth understanding how overflow and underflow occur when inserting or deleting records from a b+ tree. See: source

Search occurs in O(log n) time
Insert and delete occurs in O(logb(a)) time

Red Black Tree (balanced tree) (Source) (Video: Source)

Note: I’ve been taught Binary Search Trees, however as per the source, Red-Black trees are apparently an extension of BST. I’ll try cover this now:

inorder / postorder / preorder (source)
For all three examples, we will use the below tree.

Inorder preorder postorder

Preorder:
Start at the root, then follow the left tree, followed by the right tree:
Example: 7, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10

Inorder:
Follow the numbers literally ‘in order’ (from 0 to 10)
Example: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Postorder:
Traverse the left-most nodes and list them. Then work your way up the internal nodes. Repeat for the right subtree.
Example: 0, 2, 4, 6, 5, 3, 1, 8, 10, 9, 7

 

Min/Max Heaps:

It’s definitely worth knowing how binary heap works. It’s a neat algorithm and I’ve found that you appreciate tree structures a lot more and understand them between when you can reheapify a heap into an ordered tree. This video explains it very nicely.

min-max-heap

Image source: Video

Understand + O() characteristics

When does it make sense to use one?

 

Graphs:

Objects and pointers, matrix, adjacency list. Know each ones pro and con.

Know graph traversal algorithms, their complexities and how to implement

 

Recursion:

Recursion is something which is also pretty hot in interviews. A good site to do interview type questions is CodingBat. Recursion made a lot of sense to me when I was writing Haskell and Prolog, but below is  what I think is a simplified way of understanding how recursion works. Again, with this it’s a matter of writing code to understand it.

recursion-product

Visualising Recursion

 

Operating systems:

Understand processes, threads, concurrency issues, locks, mutexes, semaphores, monitors. I have a module on concurrency this coming semester, so I won’t cover that just yet.

Process – A process is an instance of a program which is being executed.
Threads – Allow a program to have more than one line of execution at a time.
Concurrency Issues – Think SQL concurrency, locks and deadlock. (More info)
Understand deadlocks livelock and how to avoid them
Know some sheduling (LIFO/FIFO/Roundrobin)