It may be a bit on the esoteric, but this usually spurs some thoughts.
It is edited for length and 'cause they are not paying me to pass this
along.
______________________
Great Optimism,
Dutch Driver
Abilene, TX
Hm. Telephone: 915.698.7217
mailto:
ddriver@cs1.mcm.edu
---------- Forwarded message ----------
Date: Tue, 11 Nov 1997 15:35:59 -0500
From:
agent@gordianknot.com
To:
ddriver@cs1.MCM.edu
Subject: 97Q4 Gordian Newsletter
=================GORDIAN INSTITUTE ELECTRONIC NEWSLETTER==================
November 11, 1997 | This edition:
---------------------------------------
1. Gordian's Curriculum Expands with Two New Offerings
2. Staff Article: "Data Mining... Or Just Picking Up Rocks?"
by Thomas "Tony" Rathburn
3. A New On-line Executive Journal for Data-Intensive
Decision Support is Launched on October 7
4. G6G's Directory of Intelligent Software has been Upgraded
5. Newsletter summary and unsubscribe instructions
__________________________
The Gordian Institute
http://www.gordianknot.com agent@gordianknot.com (800) 405-2114
__________________________
Additional details for both courses, to include specific dates, training
sites and detailed course outlines may be obtained through any of the
following:
- Web:
http://www.gordianknot.com - Toll Free: 800-405-2114
- Direct: 281-364-9882
- Fax: 304-547-4203
- Email:
agent@gordianknot.com (Reply with either of the following as the SUBJECT)
- Data Mining Details
- Financial Markets Details
--------------------------------------------------------------------------
2. Staff Article
DATA MINING... OR JUST PICKING UP ROCKS
- by Thomas "Tony" Rathburn
INTRODUCTION
For decades, business has recognized the potential of the vast quantities
of data they collect. Transaction processing systems hold the potential
to reveal dramatic improvements in the way businesses operate. The
methods employed to extract this information has been diverse.
Early techniques relied largely on statistical techniques to capture basic
relationships and descriptive facts. As analysts, and technology, became
more sophisticated, other tools were tested.
Today, a wide variety of tools and techniques exist for the extraction of
information content from enterprise data. Managing this effort requires a
diverse set of skills ranging from project management to technical
expertise to domain specific knowledge.
There is no magic in this effort. The concept of artificial intelligence
has not lived up to its hype. The reality is that the search for
information is specific to the goals of the project. It is unrealistic to
expect any technology to conceive the problem in a manner consistent with
any specific organization. It is realistic to develop a tool-box of
techniques to analyze data for the development of improvements in decision
making processes.
No one technology provides all the answers. The organizations achieving
the best results understand the strengths and weaknesses of the
technologies they employ, and they understand the problem they are working
on. They integrate their tools to achieve incremental improvements in
specific applications. They are driven by goals. They are not walking
around picking up rocks to see if any one might have value to them.
INFORMATION EVOLUTION
A history of industrial evolution might include significant events like
the discovery of fire, the invention of the wheel, the first printing
press, the cotton gin, and the development of the assembly line. This
evolution would also have to include the conceptual development necessary
to move from tradesmen operating on an individual basis to multinational
organizations employing the specialized skills of hundreds of thousands of
people in a coordinated manner.
The information evolution can also be examined in this manner. The
development of technology like the computer, the personal computer, mass
data storage devices, the modem and communications devices, networking
technology, and the Internet will undoubtedly be seen as significant from
a historical perspective.
We are also witnessing the conceptual development of information
processing. We have moved from centralized control of large transaction
based systems to PC based reporting and analysis systems. Recently, we
have seen the integration of these individual efforts into networked
systems. As organizations reinforce the value of these individual and
small group efforts, they are once again asserting control over the
information assets of the organization in legacy systems and data
warehousing efforts. Almost simultaneously, however, the limitations of
these massive systems are being offset by decentralization efforts, such
as data marts.
What is apparent from watching the development of information processing
is that many technologies are being brought to bear on a problem of great
significance. Conceptually, we are attempting to apply our approaches
from experience to this new way of thinking. The techniques that are
applicable are being kept. Those found wanting are being discarded.
What is not apparent to many people directly involved in this process is
that there is not one right answer to the questions being posed. In fact
there are many, each alternative with its own performance level.
It is important to recognize that the information processing systems being
developed are intended to support human decision systems. There are no
predetermined goals and objectives, other than those instituted by the
sponsors of the project. There are few causal reasons for the decisions
being supported.
In most cases we are attempting to predict or classify other human
behaviors, based on past experience. We need to recognize the
inconsistencies between people, and by the same people at different points
in time. We need to accept the idea of probabilistically correct decision
making. We can not expect that every individual decision will be correct
any more than a casino operator expects to win every hand of cards or roll
of the dice.
If the data analyst is developing a mathematical model, he is acting very
much like the customer of a casino who has never gambled before. By
observing the behavior of others and the outcomes that result, the data
analyst attempts to develop a set of rules that will result in success.
Many extremely talented people have set out to improve on the decision
making of their group, or even of their organization. The most common
single cause of failure for these people is a lack of a clear definition
of both the game they are playing and of winning. Imagine the analyst who
has developed a perfectly good model of winning at black jack suddenly
walking up to a table playing five-card stud.
Our expectations of technology are often unrealistic to the point that we
expect grand solutions to significant problems with little effort. We
expect software to understand our problem and what we deem to be important
in its solution. We then expect the software to look at our data, in
whatever form we may have collected it, and distill a magic solution. In
most cases, we would have a better chance of success by searching for a
bottle with a genie in it.
In modeling human behaviors for decision making, we need to begin by
clearly defining the parameters under which we will operate, the
constraints that will be placed on use, current performance levels and the
performance metrics used to define success.
IMPROVING PERFORMANCE
Our goal is improving performance, however we define performance. Each
individual, and each organization, has their own definition. It should be
a priority of any project to begin with a clear definition of performance.
Far too many projects begin with collecting and examining data to see
"what we can find out." Can you imagine a gold mining operation ordering
a load of gravel from the local distributor on the off chance of finding
some gold in it?
Ultimately, we are seeking better decision making for a particular
problem. We are attempting to modify our behavior when faced with a set
of circumstances. The characteristics of others, and our expectations of
their behaviors, define the circumstances we face in making a decision.
Goals and performance metrics need to be specific and measurable. "Make
more money" is not acceptable. It is simply too general. Does it imply
that we can use any and all means to achieve this simple goal. Can we
change the product, modify any system, endure any hardship? Or, are there
other parameters and constraints?
Do we have any experience with this problem, or are we starting from
scratch? Do we have a decision making process with well defined
parameters in place? Are we seeking incremental improvement to an
existing process?
We accept that we are attempting to improve our performance, either
individually or organizationally, by enhancing our understanding of the
problem. We have developed good, complete definitions of our problem, our
current status, and of success. Now, what can we do to reach success?
THE DATA MINING ENVIRONMENT
The development of better decision making models has fallen under many
titles. Fifteen years ago, the author did not realize that he was doing
"data mining" or "knowledge discovery." At the time, these efforts were
called "exploratory data analysis" by statisticians, and "knowledge
engineering" by the practitioners of expert systems.
Whatever the title placed on these efforts today, many technologies and
techniques can be introduced to the effort. Each technology, and each
specific implementation of that technology, has its own set of strengths
and weaknesses. The key to high levels of success lies in understanding
how the strengths of one technology can offset the weaknesses of another,
and then implementing a solution that integrates the strengths of many
technologies to improve the problem YOU are working on.
This article offers five main segments to the data mining environment:
- Organizational Considerations
- Data Management
- Communications Technologies
- Data Analysis & Modeling
- Implementation & Delivery Systems
Each segment of the data mining environment offers many alternatives.
And, each raises a number of issues for the practitioner to address.
While it is obviously beyond the scope of this article to address all of
these segments, or even one of them, in any detail, it is important for
the practitioner to recognize the interaction between these components of
the data mining environment.
Each of the segments can be decomposed into a number of technologies.
Within each of the technologies, a number of alternative solutions exist.
A brief listing of technologies applicable to the Data Analysis and
Modeling Segment is listed below as an example of the breadth available.
DATA ANALYSIS AND MODELING
DATA VISUALIZATION
STATISTICS
Statistics - General
CART
CHAID
Factor Analysis
K Nearest Neighbor
Logistic Regression
MARS
Optimization
Principal Components Analysis
Regression
ADVANCED TECHNOLOGIES
Advanced Technology - General
Case Based Reasoning
Decision Trees
Fuzzy Logic
Genetic Algorithms
Neural Nets
Non-linear Dynamical Systems
Rough Sets
Rule Based Systems
CONCLUSION
This article attempts to place the emphasis of data mining where it
belongs: on improving performance by clearly identifying the goals of the
project, and recognizing the myriad of tools and techniques available for
that purpose. The practitioner who can integrate these tools effectively
can reach well beyond the level of any one tool used independently. The
practitioner is also well advised to consider the overall environment in
which the data mining effort takes place.
ABOUT THE AUTHOR
Thomas A. "Tony" Rathburn has assisted clients in extracting information
from data for over 15 years. His background includes seven years as an
Instructor in the College of Business Administration at Kent State
University, and extensive consulting experience in modeling behavioral
problems in banking, insurance and the financial markets.
Mr. Rathburn is the webmaster of KDD98 <
http://www.kdd98.com >, a web
site devoted to the exchange of information related to the technologies
included in the fields of knowledge discovery, data mining, decision
support, and exploratory data analysis.
Mr. Rathburn is also the instructor for "Advanced Techniques for the
Analysis of Financial Markets" sponsored by The Gordian Institute. This
course focuses on the development and implementation of techniques that
can be directly applied to trading financial instruments in a manner
consistent with the attendees goals and objectives. The course presents a
development methodology that allows attendees to identify trading
opportunities with a high probability of success. Mr. Rathburn can be
contacted by Email at
TRathburn@kdd98.com.
--------------------------------------------------------------------------
3. A New On-line Executive Journal for Data-Intensive
Decision Support is Launched on October 7
Your company has warehoused an ocean of data - it's a major investment -
but only if you use it to your advantage... Accurate and timely guidance
will determine if you sail... or sink. With a myriad of options
available, which course do you chart? Soon, a beacon will appear...
Tabor Griffin Communications, publisher of HPCwire, is proud to announce
the imminent arrival of D S *.. D S stands for decision support, the
single concept dynamically underlying technologies designed to extract
maximum value from very large, databases, e.g. data mining, data
warehousing, knowledge discovery, OLAP, etc.
The * (pronounced "star") signifies both that which is preeminent and the
UNIX command for universal application. (Thus, if the command-line "rm *"
is entered into a UNIX system, every file in that directory will be
removed). Together, these symbols create a designation as unique and
exclusive as the publication itself.
While there are many sources of information covering issues related to
- Data Warehousing,
- Data Mining,
- Decision Support,
- On-Line Transaction Processing (OLTP),
- On-Line Analytic Processing (OLAP),
confusion about profitable leveraging of these technologies has never been
greater. D S * has been created for professionals who need concrete,
proven strategic guidance through this morass of facts and figures. D S *
features analysis, commentary and specific guidance from renowned experts
who have shown firms how to extract financial benefit from very large data
sets. Now D S * will bring these executive insights to you weekly in a
concise digital format.
The first issue of D S * broadcast on October 7. Please accept our
invitation for a free trial subscription to this timely, relevant and
insightful new executive journal. Send an Email to
dstrial@tgc.com, for
a free trial subscription to D S *
--------------------------------------------------------------------------
4. G6G's Directory of Intelligent Software has been Upgraded
The "G6G Directory of Intelligent Software" for the last seven (7)
consecutive weeks, the "Hot Link of the Week" at the TOP of
http://library.microsoft.com
G6G's Directory of Intelligent Software has been revised through the
following:
(1) The "Intelligent Software/Hardware" product abstracts and contact
info have been fully updated.
(2) It is now possible to recommend an Intelligent Software/Hardware
product for inclusion into the "G6G Directory of Intelligent
Software."
(3) A list server has been installed so that visitors may join an
Intelligent Software mailing list and receive periodic "What's
New" info about
http://www.intelligent-dir.com
(4) An "Intelligent Software Forum" promoting Intelligent
Software/Hardware chat and messages has been created.
We encourage you to again visit The G6G Directory of Intelligent Software
and explore the new features.
_________________________________________
The G6G Directory of Intelligent Software
-----------------------------------------
http://www.intelligent-dir.com -----------------------------------------
G6G Consulting Group
(310) 458-4187
g6g@asset.com _________________________________________
--------------------------------------------------------------------------
5. Newsletter summary and unsubscribe instructions
The Gordian Institute Newsletter is designed to provide quarterly news
releases which include articles, schedule updates and new course
announcements. The Gordian Institute specializes in the instruction of
new software technologies through first-rate, hands-on intensive training
courses in the fields of:
- Data Mining and Pattern Recognition
- Adaptive Machine Learning
- Intelligent Decision Systems
- Knowledge Engineering
- Hybrid AI Techniques
Gordian Institute's newsletter is not broadcast unsolicited. This
newsletter is shared with those who have elected to be on Gordian's
electronic update list, or supplied contact information to The Gordian
Institute when requesting product information. If you wish not to receive
future releases, simply send an empty reply with "remove" in the subject
field.
The parent company, American Heuristics Corporation (AHC) is a founding
member of the West Virginia High Technology Consortium, with headquarters
in Triadelphia, West Virginia. AHC is an advanced technology consulting
company applying hybrid solutions to complex technical problems in
business, industry and government. AHC may be found on the web at:
http://www.heuristics.com.
=========================================================================