|
GUHA Method, LISp-Miner system and Data Mining | |
| Dr. Jan
Rauch and Dr. Milan Simunek | |
| University of Economics, Prague, Czech Republic | |
|
|
| Friday, Oct 26, 2007 | |
| 3:00 p.m. - 4:00 p.m. | |
| Woodward Hall, Room 106 | |
Complete Description: | |
|
The GUHA method is an original Czech method of exploratory data analysis. Its principle is to offer
all the interesting facts yielded by the given data in relation to the given problem. GUHA is realized
by the GUHA procedures. The input for the GUHA-procedure consists of the analyzed data and of a few
parameters defining a very large set of potentially interesting patterns. The output is a list of all
prime patterns. The pattern is prime if it is both true in the analyzed data and it does not immediately
follow from the other more simple output patterns.
GUHA has been developed since the 1960s. The oldest GUHA procedure is the ASSOC procedure, which mines for association rules. It mines not only for "classical" association rules with confidence and support, but also for additional association rules describing various relations of two Boolean attributes, including relations corresponding to statistical hypotheses tests. The ASSOC procedure was implemented several times. The implementation is not based on the well-known a-priori algorithm but it uses suitable strings of bits to represent analyzed data. The most used GUHA procedure today is the 4ft-Miner procedure, which mines for various types of association rules including conditional rules. Software tools that make it possible to very quickly compute various contingency tables were developed for the 4ft-Miner procedure. These tools were used to implement five additional GUHA procedures. All of these procedures are included in the academic software system LISp-Miner http://lispminer.vse.cz . The GUHA procedures of the LISp-Miner system mine for various patterns that are verified using one or two contingency tables. All of these procedures have very fine tools, to adjust the set of relevant patterns that are to be generated and verified. The LISp-Miner system was many times applied to solve real practical tasks of data mining. Several research activities are also related to GUHA methods and to the LISp-Miner system. They concern namely logical calculi for data mining, applications of semantics in data mining, analytical reports summarizing results of data mining and automatic converting generalized association rules into sentences of natural language; see http://lispminer.vse.cz/research/. Main features of the LISp-Miner will be introduced and several examples of its applications will be described in the lecture. In addition, overview of related research results will be given. Bio: Jan Rauch : In 1972 he graduated from Faculty of Mathematics and Physics, Charles University, Prague, specialization numerical mathematics. He obtained PhD. in mathematical logic from Institute of Mathematics of the Czechoslovak Academy of Sciences in 1986. In 1999 he became an associated professor at Faculty of Informatics and Statistics of University of Economics in Prague. Both his dissertation and habilitation works are related to logical foundations of knowledge discovery in databases. He has published over 80 specialized works including book chapters, journals papers and contributions to proceedings from important international conferences. In 1999 he was co-chair of the PKDD (Principles and Practice of Knowledge Discovery in Databases) conference program committee, in 2000 he was the workshop chair and in 2007 the tutorial chair of PKDD. In addition he is a member of program committees to about 30 international conferences and workshops. Milan Simunek: Graduated in 2001 from Faculty of Informatics and Statistics, University of Economics, Prague, specialization Informatics. He obtained Ph.D. in Applied Informatics from the same university in 2004. Both his dissertation and habilitation works are related to automated computer processing of unstructured texts. He has published over 40 specialized works including conference papers, journal papers, lecture notes, book chapters and one book. His main focus is bespoken software development and includes development of the LISp-Miner system for KDD, prediction system ELVIRA for gas consumption forecast, booking system for hotels, 3D virtual reality project of Praha4D and others. | |