Tuesday 3 December 2013

Sorts and Efficiency (SLOG topic for Nov. 28th)

During the week of Nov.4th, we had already come to the conclusion that the python built-in sort is the most efficient. Then it is quick sort and merge sort (with O(nlog n)), while selection sort being the least efficient (with O(n2)). Also during lab09, we were asked to run each different sort and record the time each sort took to sort a list with 1000, 2000, 3000, 4000, 5000, and 6000 items. We also repeated the same steps for sorted list as well as reverse sorted list. The results did match the race we did in class during the week of Nov. 4th.

During the second half of the Wednesday class, we revisited the quick sort algorithm. Professor Heap pointed out that if we run quick sort on an already sorted list (reversed or not), it would take a longer time (the worst case scenario). In this case, instead of O(nlog n), quick sort would be O(n2). Normally, quick sort works by picking the first item as a pivot, arrange all the items with value lower than the pivot in one list, and all the value higher than the pivot in another list. Then it would call quick sort on the two separate lists. The main problem is that the algorithm always takes the first item as the pivot. If the list is already sorted, then one of the lists would be empty. Thus the divide-and-conquer method would not apply. In addition, python has a limit on the number of function calls. If the input is a sorted list with 10000 items, then quick sort would call upon itself 10000 times, which would probably exceed the call limit. In order to solve this problem, we modify the code such that it takes an arbitrary item as the pivot (by using the random module). When the size of the input is large, the probability of the pivot being the smallest or the largest is very small, thus it can solve the problem of sorting sorted list.

Name, Trace, and Efficiency (Week of Nov. 25th)

We learned about name and tracing on Monday. In python, every name contains a value, and every value refers to the address of the object. Also when python is searching for name, it goes in the order of local, nonlocal, global, and then built-in names. Professor also showed us an example of local, nonlocal, and global variable. Prior to this example, I never really understood what global does and I have never encounter nonlocal before. I found the python visualizer really helpful when going through the example, because it list what the variable is assigned to in different frame (eg. In global frame, spam is assigned to “global spam”, while in scope_test frame, spam is assigned to “nonlocal spam”).
After finishing the python memory model, Professor Heap started the topic of tracing method. I found tracing not too complicated for most of the function, as most of the functions are pretty straightforward. Determining the output of a function would not be very hard as long as you understood what the function is supposed to return. However, when tracing recursion function, Professor suggested that we should not trace too far. We should first trace the simplest non-recursive case. Then we should trace the next most complicate case, and plug in the results obtained from the simplest case, and repeat the previous steps.
On Wednesday, Professor Heap started off looking at the Fibonacci recursive function. Professor Heap pointed out that the function made a lot of redundant calls when computing the nth number in the Fibonacci sequence, especially when n is very large. However, if we could record each number in the sequence into a table, then the efficiency would be improved. Of course a lot of programmer had already realized this solution, and they called it memorization. Memorization records value into a table (dictionary usually), and when it is called, it would look up the corresponding value in the table. The original function is only run when the input is not in the table. After computing the output, it would be recorded into the table to prevent redundant calls. In this way, the efficiency can be improved by a lot; thus looking for large Fibonacci number would take as long anymore!

Mid-Term 2 Results , Property, and MapReduce (Week of Nov. 18th)

The TA’s for this course marked really quickly! It only had been a week and Professor Heap had already returned the Mid-Term back to us. I did generally well on this Mid-Term, and I believed the results truly reflects my knowledge on recursive tree structure (including binary tree as well as binary search tree). For question 1, it asked us to return a tuple containing the number of interior nodes and the number of leaves in the tree. Since I was not really familiar with tuple, I decided to write two helper functions within the count_node function: one function would count the number of interior node while the other would count the number of leaves. count_node would then call upon the two help methods and return the intended result as a tuple. For question 2, it asked us to return the number of nodes with value that was even. This question was pretty straightforward; the code for this function was very similar to the function that counts the number of nodes in a whole tree, just with a slight modification. As for question 3, it asked us to build a linked list of the path from root to minimum element. It was really similar to the question we did in lab08, where we had to use linked list structure to represent inorder traversal of a recursive tree. However, I for some reason checked for right child of the BST when I was writing my answer. Consequently, I got 2 marks off question 3 (and the whole Mid-term). Nonetheless, I am still very satisfied with the result!
This week we learned about the Property function as well as MapReduce, in which both functions were very new to me. One of the functions of Property is that it can restrict the attributes of an object. Property takes in four arguments: function that get the attribute, function that set the attribute, function that delete the attribute, and a docstring. They are all None by default. Professor Heap only went through the first two function arguments (get and set). At first, I was really confused in class, I have no idea when the program would invoke the get or set function. However, after reading the background material posted on the CSC148 website, I was more comfortable with Property as I got to see different examples on the internet. Similar to Property, MapReduce is very new to me, especially the Reduce part. I did not know how reduce work at first, so I decided to ask my friend to explain it to me. Now I understand that reduce combines elements of iterable into a single reduced value. I believe by reading background material and consulting friends in computer science, I have a better grasp on how to use Property and MapReduce.

Thursday 14 November 2013

Reviewing for Mid-Term 2 (Week of Nov. 11th)

There was no class on Monday and Tuesday, so I spent most of the long weekend studying for the Mid-Term on Wednesday.  First, I made notes based on the lecture slides and the extra-reading posted on the CSC148 website. I believed during the process of making notes, I could memorize the material better. It would also be easier to review before exam because I can just focus on studying the notes instead of going through all the readings and example posted.  I specifically spent most time on the “Binary Search Tree” section, because I found the insert and delete method confusing. Each of them had four cases (if/elif/else clause), though very similar at first glance, they were actually different. I also went through the terminologies used in recursive tree structure, as well as linked structure.
After finishing my notes, I started to look at the labs documents. I tried to re-do the lab question starting from lab 06. Of course I took less time when re-doing the lab question, because I already had the ideas on how to implement each function during lab time. However, after finishing the lab questions, I compared the two codes: they were very similar, but the first one had some redundant lines, which made the code less efficient.  Afterwards, I did the same thing to the exercise, starting from exercise 4. I realized I spent significantly less time completing the exercise and the implementation was more efficient.
On Monday night, I finished all of the labs and exercise questions, so I didn’t know what to do next. I started making up question related to the content. For example, in lab08, we had to present a recursive tree in inorder traversal, in linked list structure. I also tried to do preorder traversal and postorder traversal on my own, and it wasn’t too hard. I also tried to create a function that does selection sort on a linked list structure (assuming the data that the node holds is float). The selection sort method was a bit harder, because I had to write a few helper methods such as a function that find the node that holds the minimum value in a linked list. Nevertheless, when I finished coding, I felt very accomplished.

Running Time Analysis (Week of Nov. 4th)

This week, we learned more about running time analysis, specifically analyzing three different sorting methods. Again, since I do not have a solid background in computer programming, I only knew the concept behind bubble sort. I really appreciate that Professor Heap used cups with different height as visualizer for how each sorting method work. I am pretty sure if I were to look at the code alone; it would take me hours to understand the idea of each sorting method and how it works. The result from the race among the different sorts, including the count sort and the python built-in sort, is also really interesting. For instance, even though both quick sort and merge sort are O(n log n), one still out speed another by a constant.

Speaking of which, I have trouble understand Big-Oh Analysis. When going through the 3 sorting methods in class, it took me a while to understand why each sort corresponds to each function. Selection sort is O(n^2) because during the ith iteration, the program has to run through (n – i) item in the list. So adding all iteration, selection sort would take n + (n – 1) + (n – 2) + ……. + 2 + 1 number of steps. By Gauss formula, the amount of steps taken is n(n+1)(0.5). Thus selection sort is O(n^2). Similarly, merge sort is O(n log n) because the merge_sort function split the list into two smaller list (ie. It is O(log n)), and the merge function has to go through each item in list 1 and list 2, in which there are n items in list 1 list 2 combined (ie. It is O(n)). Combining the two functions, we will have O(n log n). However, quick sort is the one that I have the most problem. I understand the “log n” part, because quick sort uses the divide and conquer method. Nevertheless, I do not understand where the “n” comes from. I should probably read the background readings again and see if there is any explanation. If I still do not understand, I will probably do some research through Google, and hopefully there will be an explanation on why quick sort is O(n log n).

Sunday 3 November 2013

More on Binary Tree (Week of Oct.28th)

This week, we studied a more advance binary tree structure: the binary search tree. I believe the binary search tree is more efficient when inserting and deleting nodes, just as binary search is more efficient than linear search. In addition, we also learned about running time analysis. I really appreciate how Professor Danny Heap decided to go through the running time analysis section now, because in CSC165, Professor Tom Fairgrieve is also teaching us how to analysis the efficiency of a program by checking the average number of steps, the maximum number of steps (worst case), as well as the minimum number of steps (best case). I found it easier to understand the concept of running time analysis because both professors are teaching closely related material, and I am able to grasp the idea behind better.
As for assignment 2, I found it shorter compare to assignment 1. It also takes less understanding of the program when coding. For instance, in assignment 1, I have to first understand what each class or method does, then proceed to coding the class or method. I remember I took a long time to understand the "add" method in assignment 1 because I did not know what it is supposed to do. However in assignment 2, the instructions are pretty clear and the programming part wasn't that hard. Nevertheless, I did have trouble with matching a string with a regular expression that contains the "*". As stated in the assignment 2 guidelines, "A regex of the form r + '*' matches string s if.....s has the form s1 + s2 +......+ sk where k > 0 and r matches every si". I had problem determining where to cut the string s such that each components will match the regular expression. My friend and I are stilling working on regex_match star node part. Hopefully we will manage to figure it out!


Saturday 2 November 2013

Mid-Term Review (Week of Oct.21st)

We finally got our mid-term back! Overall, I did pretty well on the mid-term, but I cannot believe I misread the first question and did max of the tuple instead of min. I only got the part (a) and (b) right in the first question. For the second question, we are supposed to write a function that join string in list (potentially nested) together. I did question 2 in the last 10 minutes of the test, so instead of writing one single function, I first wrote a helper function that flattens a nested list, then a function that joins a (normal) list together. However, I did try to do question 2 at home, without writing any helper method. Even though it took me more than 10 minutes, I am glad that it worked. For question 3, we are supposed to write two methods for a class: function_append: a function that returns a copy of the list with an item appended, and function_sort: a function that returns a copy of the list, sorted. At first, I just create a copy of the list and finished the rest of the definition. However, I noticed that the two functions are supposed to return an instance of FunctionalList. So I rewrote the code to return a FunctionalList. As for part (b), I forgot to include the empty case into the test case. Consequently, I got couple marks off on part (b). Overall, I think I did pretty well in this mid-term, but it could have been better.

As for class, this week we learned about linked structure. We used linked list to implement stack ADT and we also created a tree using a node class and a tree class. However, I still find the concept behind linked structure confusing, especially when you have to insert or delete a node from the structure. For instance, during lab, it took me a long time to figure out how write the method dequeue. It still takes me some time to think through how "self.top" and "self.last" change when a node is added or deleted from the linked structure. However, I do find diagrams really helpful when coding for linked structure. I think I will get better at linked structure once I practice more or read more examples on linked structure.