Thoughts? The correct analogy is not a metal detector, but a needle in a haystack. I think you misunderestimate the amount of false leads you get doing this by several orders of magnitude. If your system generates too many false leads, it will be (to borrow an analogy from Bruce Schneier, who you should read every week, and writes about false alarms in the context of airport security) like adding more hay to a haystack with a missing needle.
Impediments? People lie about their habits. People do not accurately remember things throughout the day. How to deal with missed days? If people miss days for different reasons, how do you deal with those gaps. You have to assume that the days they miss would be different enough from days they don't miss to invalidate the data of people who miss days.
I'm not sure whether you are referring to the same argument, but I think I remember that Bruce Schneier talks about the lack of correct positives that makes the application of data mining methods to terrorist incidents useless. He makes the point that very few positives for the system to learn from combined with a very large number of variables causes a huge number of false alarms (not sure if I remember all he said correctly).
So it's all about how many confirmed positives you have and whether you have reliable data on them. It works very well for credit card fraud because there is a sufficiently large number of actual fraud cases, you have reliable data about them and false alarms do not cause major disruption.
I agree with your concern about the reliability and completeness of data. I still think it's an interesting idea if there was a way to extract the data from a reliable source instead of working with what people claim to be the case.
LPTS,
You bring up good points. Self-reports are notoriously unreliable (like eye-witness testimony) however, it is still use-able data. In terms of missing days, the questions can be ordered and repeated to those who miss. In terms of false correlations, you can reduce this by being careful about how you use and analyze the data-set. Multiple regression analysis can be dangerous--it's all in how it is used. If you limit the items you correlate to those for which you have a hypothesis, minimally this data should be valuable in that manner.
Impediments? People lie about their habits. People do not accurately remember things throughout the day. How to deal with missed days? If people miss days for different reasons, how do you deal with those gaps. You have to assume that the days they miss would be different enough from days they don't miss to invalidate the data of people who miss days.