Is there a way to get the Drill through to work with model viewer controls? Any workaround?
Also, where would I find more complex data mining models for review? SQLServerDataMining site, Microsoft data mining tutorial site, and Data Mining with SQL 2005 book, all checked. Anything else?
-Young K
Do you mean for the redistibutable winform controls or the webform sample controls?
For "more complex models", what kind of models are you looking for?
|||Support for Drillthrough in the winform viewer controls will be available from Yukon SP2 onwards through a new interface. We'll post a sample on how to use it once SP2 is publicly available.
You can build a complex model using the Movies dataset from sqlserverdatamining.com which has nested tables.
|||Thanks, Jamie and Shuvro. I'm using winform.
Any idea on when Yukon SP2 is due to be released?
I'm trying to balance the benefit (and complexity) of nested tables vs case table model. You can flatten most scenario into case model for simplicity, but nested model seems to fit better in many cases. Is there accuracy benefit to nested models? What would be rule of the thumb? If you can get away with case table model, stay with case table model?
-Young K
|||
One more thing. What are your thoughts on multi-level nested tables?
-Young K
|||SP2 will be available the first quarter of 2007. A community technical preview (CTP) build should be available for download within the next few weeks.
Regarding nested tables, I would go with nested tables if your data is in that format. There should be no accuracy difference, there is only a modeling difference. For example, if you have a nested table, an algorithm can "know" that the attributes from the nested table are related to each other. For example I could have "Products Bought in May" and "Products Bought in June".
Additionally I could have multiple columns in my nested table. If I have for example Product Name, Quantity, and Discount, the attributes are already set up that Quantity of Milk is related to Discount of Milk is related to the existence of Milk. The modeling is already done for me.
Another issue is data density. At the case level, the data is "rectangular" - each attribute(column) has the set of values present in the data plus the value "missing". When a NULL value is encountered, the framework sends the "missing" value to the algorithm. (Currently this adds to the total state count whether missing data was encountered or not, which is why it always has a non-zero probability when you look at trees, for example). This means that if there are 100 columns in a case, the frame work will send 100 attributes to the algorithm during training even if most of them are NULL. When you use nested tables, the data is no longer "rectangular" it is "ragged" - e.g. the cases are of variable length. If you have 5 case columns and 95 nested attributes, and a particular case only has 4 attributes in its nested table, the framework will only send 9 attributes to the algorithm. Note that this does add complexity to how the algorithms consume data - i.e. they can't rely on the cases being the same length so the code has to handle many more circumstances, but that's the algorithm developers problem, not the data miners problem.
Finally, nested tables allow you to describe an unlimited number of attributes. Using case columns only, you are essentially limited to the width of your RDBMS - in SQL Server 1024 columns = 1024 attributes. We have test models with 750,000 attributes.
I would use nested tables where natural to do so - it gives you much more flexibility and capability than you may think at first site.
|||We don't support multi-level nested tables. While way back when we came up with the concept for SQL 2000 data mining we considered it, we discovered that the use cases are too limited and the code would be too complex to make it worthwhile implementing.
The only use case we could think of is examining pages of books that people read. (or objects of a category that are in a particular case) For example I would have a nested table of books and under that I would have a nested table of pages. Each case would only have the pages read by the case level individual.
However, this really posed many questions. Was is that valuable to have the results broken down by book? Your nested table source would look like
CustID BookID Page
You could easily do this without another nested table, unless you wanted to also have information about the page in particular, e.g. reading time, a la
CustID BookID Page ReadingTime
Then you could model the nested -nested table. However, are the complexities added to the language semantics worth this (somewhat limited) use case? Is the workaround of modeling your data such as
CustID (BookID & PageID) ReadingTime
good enough?
In the end we decided that the additional complexities weren't justified at the time. If we get a large enough cohort of customers with production blocking use cases that we just have to have it, then we would definately reconsider. However, considering that the flexibility of nested tables is already leaps and bounds ahead of what is offered elsewhere in the market, I think it would be quite a while before we needed to head down that path.
Thanks
-Jamie
|||Thank you, Jamie. You are the man.
-Young K.
No comments:
Post a Comment