Machine Learning

Thursday, September 15, 2005

Consolidation

Hi all. I've consolidated my computer science blog, my personal blog, and my website. It can all now be found at http://zigoku.net . Thanks for reading, and see you there.

Monday, March 28, 2005

UAI paper on monotonicity constraints

Here is the abstract from the paper Tom Dietterich, Angelo Restificar, and myself have just submitted to UAI:

"When training data is sparse, more domain knowledge must be incorporated into the learning algorithm in order to reduce the effective size of the hypothesis space. This paper builds on previous work in which knowledge about qualitative monotonicities was formally represented and incorporated into learning algorithms (e.g., Clark & Matwin's work with the CN2 rule learning algorithm). We show how to interpret knowledge of qualitative influences, and in particular of monotonicities, as constraints on probability distributions, and to incorporate this knowledge into Bayesian network learning algorithms. We show that this yields improved accuracy, particularly with very small training sets."

Full text in pdf or in ps

Saturday, March 26, 2005

Why I decided to work for Google

Cross-post on my other blog.

Friday, March 25, 2005

New research tool

In case you haven't already, check out http://scholar.google.com/ . It looks pretty nice.

Thursday, March 10, 2005

More prior work

Potharst and Feelders have an article in SIGKDD Explorations, June 2002, about learning monotonic classification trees (with monotonic and non-monotonic, i.e., contradictory to the prior, data). There's no probabilistic stuff in there, but nonetheless it's interesting to see others are pursuing similar routes. Read here.

Wednesday, March 09, 2005

Qualitative and quantitative priors

Expert domain knowledge is usually qualitative, not quantitative, and thus, eliciting probability numbers from experts is usually very difficult. One of the core goals of the KI-Learn project is to use qualitative knowledge, and we attempt this with a language whose qualitative statements
  1. are easy and natural to write by domain experts
  2. have well-defined semantics for probability distributions which correspond to experts' intuitions
However, things aren't so simple. Suppose our training data contradicts an expert's qualitative statement. The proper posterior depends on how much we believe in our domain knowledge and how much data we have. In fact, the question is: how strong is our expert's prior? This is where we are forced right back into specifying quantitative aspects of our model (i.e., the numbers that parameterize the expert's prior).

So here is my question: is it ever possible to specify purely qualitative domain knowledge? I suppose the answer is: only if you assume the knowledge is true with probability 1 (which of course is simply making the quantitative part implicit). This is nasty, though. Nobody wants to state something is true with probability 1, but nobody wants to specify probabilities, either. Is there any alternative to picking the lesser of these two evils? It seems the answer is no...?

Thursday, March 03, 2005

KIML: future work

Examples of valuable domain knowledge other than monotonicities include synergistic influence (two things both positively influence an outcome, but their combined effect is greater than mere additivity) and relative strength-of-influence (two things both influence an outcome, but it is known one is a significantly stronger predictor than the other). We have not yet run experiments to test the value of these statements, but we do have defined mappings from such statements to constraints on probability distributions, and we expect similar results as were obtained for monotonicities.

Longer term "knowledge-intensive machine learning" goals at Oregon State are more ambitious: automatic feature engineering, model simplification, etc.

By the way, for those of you interested in details, I can provide a current draft of my thesis (especially if you are willing to provide constructive criticism :-).