Data Mining

Winter, 2010

20-260-858-001

Midterm Examination

--- take home examination ---

DRAFT


due:

What you turn in must be typed. You can slide it under your professor's door. Or, send it in as a PDF.

(1) 3 points: List all of the individual papers discussed in this course in the format of: author, title, where the paper appeared, date. You only have to list the last name of the author. You are to include papers considered from before the midterm. Do not consider the Han book to be a paper. List the papers vertically. The authors must be listed alphabetically.

(2) 10 points: List the upper bound of the complexity for each paper. List it in the form of: (paper author, complexity equation)

(3) 10 points: Fill out and form a table of the following format. Two different jargon words are required. They must be those that can be used to identify this paper and separate it from other database mining papers. These words can be considered to be keywords. Thus, terms such as "mining" or "learning" are not useful as they do not separate one paper from the other. The authors must be listed alphabetically.

(4) 10 points: Fill out and form a table of the following format. The term "basis for search" means what is used to make choices, not search strategies such as "top down". For example, Mazlack uses cohesion. The authors must be listed alphabetically. Indicate whether the method is inductive or deductive.

(5) 9 points Rank order the relative grain size of each paper from large to small. One author on each line. Largest grain size first. Group the papers in grain size of 1 to 10 where 10 is the largest. Show all of the papers of grain size 1 together, then grain size 2 together, etc.

(6) 9 points Rank order the relative degree of supervision of each paper from large to small. One author on each line. Smallest degree of supervision first. Group the papers in supervision level of 1 to 10 where 10 is the largest. Show all of the papers of supervision level 1 together, then supervision level 2 together, etc.

(7) 9 points: Identify three sets of two pairs of papers that are similar in their results and describe why they are similar. None of the papers may be the same. This means that you may choose six papers. The papers must be as similar in results as possible. Part of your evaluation will depend on how well you choose similar papers.

(8) 9 points: Identify three sets of two pairs of papers that are similar in their methods and describe why they are similar. None of the papers may be the same. This means that you may choose six papers. The papers must be as similar in methods as possible. Part of your evaluation will depend on how well you choose similar papers.

(10) 31 points A number of workers have written on determining causality in data; among them are Pearl, Silverstein, Hobbs, and Simon. (Pearl has written several articles and a book on the topic.)

Papers by Hobbs and Silverstein and have links to them off of the course's schedule page.

You are to write a short article of at least 500 words or causality that integrates the work of Pearl, Silverstein, Hobbs, and two other authors of your choice and discovery. (Do not include anything by Mazlack.) By "integrate," it is meant that you are to at least compare the papers as to: (a) the closeness of their approaches, (b) the kind of data that each might be suitable for, (c) their relative complexity, and at least one other way of comparing them.

Include hard copies of the papers that you use (other than those available from the course web page.)

Some starting places:

last changed: 17 December 2009