Discussion: View Thread

97Q4 Gordian Newsletter--Data Mining

  • 1.  97Q4 Gordian Newsletter--Data Mining

    Posted 11-11-1997 16:34
    It may be a bit on the esoteric, but this usually spurs some thoughts.
    It is edited for length and 'cause they are not paying me to pass this
    along.

    ______________________
    Great Optimism,

    Dutch Driver
    Abilene, TX
    Hm. Telephone: 915.698.7217
    mailto:ddriver@cs1.mcm.edu

    ---------- Forwarded message ----------
    Date: Tue, 11 Nov 1997 15:35:59 -0500
    From: agent@gordianknot.com
    To: ddriver@cs1.MCM.edu
    Subject: 97Q4 Gordian Newsletter


    =================GORDIAN INSTITUTE ELECTRONIC NEWSLETTER==================
    November 11, 1997 | This edition:
    ---------------------------------------

    1. Gordian's Curriculum Expands with Two New Offerings

    2. Staff Article: "Data Mining... Or Just Picking Up Rocks?"
    by Thomas "Tony" Rathburn

    3. A New On-line Executive Journal for Data-Intensive
    Decision Support is Launched on October 7

    4. G6G's Directory of Intelligent Software has been Upgraded

    5. Newsletter summary and unsubscribe instructions

    __________________________
    The Gordian Institute
    http://www.gordianknot.com
    agent@gordianknot.com
    (800) 405-2114
    __________________________

    Additional details for both courses, to include specific dates, training
    sites and detailed course outlines may be obtained through any of the
    following:

    - Web: http://www.gordianknot.com
    - Toll Free: 800-405-2114
    - Direct: 281-364-9882
    - Fax: 304-547-4203
    - Email: agent@gordianknot.com
    (Reply with either of the following as the SUBJECT)
    - Data Mining Details
    - Financial Markets Details


    --------------------------------------------------------------------------

    2. Staff Article

    DATA MINING... OR JUST PICKING UP ROCKS
    - by Thomas "Tony" Rathburn


    INTRODUCTION
    For decades, business has recognized the potential of the vast quantities
    of data they collect. Transaction processing systems hold the potential
    to reveal dramatic improvements in the way businesses operate. The
    methods employed to extract this information has been diverse.

    Early techniques relied largely on statistical techniques to capture basic
    relationships and descriptive facts. As analysts, and technology, became
    more sophisticated, other tools were tested.

    Today, a wide variety of tools and techniques exist for the extraction of
    information content from enterprise data. Managing this effort requires a
    diverse set of skills ranging from project management to technical
    expertise to domain specific knowledge.

    There is no magic in this effort. The concept of artificial intelligence
    has not lived up to its hype. The reality is that the search for
    information is specific to the goals of the project. It is unrealistic to
    expect any technology to conceive the problem in a manner consistent with
    any specific organization. It is realistic to develop a tool-box of
    techniques to analyze data for the development of improvements in decision
    making processes.

    No one technology provides all the answers. The organizations achieving
    the best results understand the strengths and weaknesses of the
    technologies they employ, and they understand the problem they are working
    on. They integrate their tools to achieve incremental improvements in
    specific applications. They are driven by goals. They are not walking
    around picking up rocks to see if any one might have value to them.


    INFORMATION EVOLUTION
    A history of industrial evolution might include significant events like
    the discovery of fire, the invention of the wheel, the first printing
    press, the cotton gin, and the development of the assembly line. This
    evolution would also have to include the conceptual development necessary
    to move from tradesmen operating on an individual basis to multinational
    organizations employing the specialized skills of hundreds of thousands of
    people in a coordinated manner.

    The information evolution can also be examined in this manner. The
    development of technology like the computer, the personal computer, mass
    data storage devices, the modem and communications devices, networking
    technology, and the Internet will undoubtedly be seen as significant from
    a historical perspective.

    We are also witnessing the conceptual development of information
    processing. We have moved from centralized control of large transaction
    based systems to PC based reporting and analysis systems. Recently, we
    have seen the integration of these individual efforts into networked
    systems. As organizations reinforce the value of these individual and
    small group efforts, they are once again asserting control over the
    information assets of the organization in legacy systems and data
    warehousing efforts. Almost simultaneously, however, the limitations of
    these massive systems are being offset by decentralization efforts, such
    as data marts.

    What is apparent from watching the development of information processing
    is that many technologies are being brought to bear on a problem of great
    significance. Conceptually, we are attempting to apply our approaches
    from experience to this new way of thinking. The techniques that are
    applicable are being kept. Those found wanting are being discarded.

    What is not apparent to many people directly involved in this process is
    that there is not one right answer to the questions being posed. In fact
    there are many, each alternative with its own performance level.

    It is important to recognize that the information processing systems being
    developed are intended to support human decision systems. There are no
    predetermined goals and objectives, other than those instituted by the
    sponsors of the project. There are few causal reasons for the decisions
    being supported.

    In most cases we are attempting to predict or classify other human
    behaviors, based on past experience. We need to recognize the
    inconsistencies between people, and by the same people at different points
    in time. We need to accept the idea of probabilistically correct decision
    making. We can not expect that every individual decision will be correct
    any more than a casino operator expects to win every hand of cards or roll
    of the dice.

    If the data analyst is developing a mathematical model, he is acting very
    much like the customer of a casino who has never gambled before. By
    observing the behavior of others and the outcomes that result, the data
    analyst attempts to develop a set of rules that will result in success.

    Many extremely talented people have set out to improve on the decision
    making of their group, or even of their organization. The most common
    single cause of failure for these people is a lack of a clear definition
    of both the game they are playing and of winning. Imagine the analyst who
    has developed a perfectly good model of winning at black jack suddenly
    walking up to a table playing five-card stud.

    Our expectations of technology are often unrealistic to the point that we
    expect grand solutions to significant problems with little effort. We
    expect software to understand our problem and what we deem to be important
    in its solution. We then expect the software to look at our data, in
    whatever form we may have collected it, and distill a magic solution. In
    most cases, we would have a better chance of success by searching for a
    bottle with a genie in it.

    In modeling human behaviors for decision making, we need to begin by
    clearly defining the parameters under which we will operate, the
    constraints that will be placed on use, current performance levels and the
    performance metrics used to define success.


    IMPROVING PERFORMANCE
    Our goal is improving performance, however we define performance. Each
    individual, and each organization, has their own definition. It should be
    a priority of any project to begin with a clear definition of performance.
    Far too many projects begin with collecting and examining data to see
    "what we can find out." Can you imagine a gold mining operation ordering
    a load of gravel from the local distributor on the off chance of finding
    some gold in it?

    Ultimately, we are seeking better decision making for a particular
    problem. We are attempting to modify our behavior when faced with a set
    of circumstances. The characteristics of others, and our expectations of
    their behaviors, define the circumstances we face in making a decision.

    Goals and performance metrics need to be specific and measurable. "Make
    more money" is not acceptable. It is simply too general. Does it imply
    that we can use any and all means to achieve this simple goal. Can we
    change the product, modify any system, endure any hardship? Or, are there
    other parameters and constraints?

    Do we have any experience with this problem, or are we starting from
    scratch? Do we have a decision making process with well defined
    parameters in place? Are we seeking incremental improvement to an
    existing process?

    We accept that we are attempting to improve our performance, either
    individually or organizationally, by enhancing our understanding of the
    problem. We have developed good, complete definitions of our problem, our
    current status, and of success. Now, what can we do to reach success?


    THE DATA MINING ENVIRONMENT
    The development of better decision making models has fallen under many
    titles. Fifteen years ago, the author did not realize that he was doing
    "data mining" or "knowledge discovery." At the time, these efforts were
    called "exploratory data analysis" by statisticians, and "knowledge
    engineering" by the practitioners of expert systems.

    Whatever the title placed on these efforts today, many technologies and
    techniques can be introduced to the effort. Each technology, and each
    specific implementation of that technology, has its own set of strengths
    and weaknesses. The key to high levels of success lies in understanding
    how the strengths of one technology can offset the weaknesses of another,
    and then implementing a solution that integrates the strengths of many
    technologies to improve the problem YOU are working on.

    This article offers five main segments to the data mining environment:

    - Organizational Considerations
    - Data Management
    - Communications Technologies
    - Data Analysis & Modeling
    - Implementation & Delivery Systems

    Each segment of the data mining environment offers many alternatives.
    And, each raises a number of issues for the practitioner to address.
    While it is obviously beyond the scope of this article to address all of
    these segments, or even one of them, in any detail, it is important for
    the practitioner to recognize the interaction between these components of
    the data mining environment.

    Each of the segments can be decomposed into a number of technologies.
    Within each of the technologies, a number of alternative solutions exist.
    A brief listing of technologies applicable to the Data Analysis and
    Modeling Segment is listed below as an example of the breadth available.

    DATA ANALYSIS AND MODELING
    DATA VISUALIZATION
    STATISTICS
    Statistics - General
    CART
    CHAID
    Factor Analysis
    K Nearest Neighbor
    Logistic Regression
    MARS
    Optimization
    Principal Components Analysis
    Regression
    ADVANCED TECHNOLOGIES
    Advanced Technology - General
    Case Based Reasoning
    Decision Trees
    Fuzzy Logic
    Genetic Algorithms
    Neural Nets
    Non-linear Dynamical Systems
    Rough Sets
    Rule Based Systems


    CONCLUSION
    This article attempts to place the emphasis of data mining where it
    belongs: on improving performance by clearly identifying the goals of the
    project, and recognizing the myriad of tools and techniques available for
    that purpose. The practitioner who can integrate these tools effectively
    can reach well beyond the level of any one tool used independently. The
    practitioner is also well advised to consider the overall environment in
    which the data mining effort takes place.


    ABOUT THE AUTHOR
    Thomas A. "Tony" Rathburn has assisted clients in extracting information
    from data for over 15 years. His background includes seven years as an
    Instructor in the College of Business Administration at Kent State
    University, and extensive consulting experience in modeling behavioral
    problems in banking, insurance and the financial markets.

    Mr. Rathburn is the webmaster of KDD98 < http://www.kdd98.com >, a web
    site devoted to the exchange of information related to the technologies
    included in the fields of knowledge discovery, data mining, decision
    support, and exploratory data analysis.

    Mr. Rathburn is also the instructor for "Advanced Techniques for the
    Analysis of Financial Markets" sponsored by The Gordian Institute. This
    course focuses on the development and implementation of techniques that
    can be directly applied to trading financial instruments in a manner
    consistent with the attendees goals and objectives. The course presents a
    development methodology that allows attendees to identify trading
    opportunities with a high probability of success. Mr. Rathburn can be
    contacted by Email at TRathburn@kdd98.com.


    --------------------------------------------------------------------------

    3. A New On-line Executive Journal for Data-Intensive
    Decision Support is Launched on October 7

    Your company has warehoused an ocean of data - it's a major investment -
    but only if you use it to your advantage... Accurate and timely guidance
    will determine if you sail... or sink. With a myriad of options
    available, which course do you chart? Soon, a beacon will appear...

    Tabor Griffin Communications, publisher of HPCwire, is proud to announce
    the imminent arrival of D S *.. D S stands for decision support, the
    single concept dynamically underlying technologies designed to extract
    maximum value from very large, databases, e.g. data mining, data
    warehousing, knowledge discovery, OLAP, etc.

    The * (pronounced "star") signifies both that which is preeminent and the
    UNIX command for universal application. (Thus, if the command-line "rm *"
    is entered into a UNIX system, every file in that directory will be
    removed). Together, these symbols create a designation as unique and
    exclusive as the publication itself.

    While there are many sources of information covering issues related to
    - Data Warehousing,
    - Data Mining,
    - Decision Support,
    - On-Line Transaction Processing (OLTP),
    - On-Line Analytic Processing (OLAP),

    confusion about profitable leveraging of these technologies has never been
    greater. D S * has been created for professionals who need concrete,
    proven strategic guidance through this morass of facts and figures. D S *
    features analysis, commentary and specific guidance from renowned experts
    who have shown firms how to extract financial benefit from very large data
    sets. Now D S * will bring these executive insights to you weekly in a
    concise digital format.

    The first issue of D S * broadcast on October 7. Please accept our
    invitation for a free trial subscription to this timely, relevant and
    insightful new executive journal. Send an Email to dstrial@tgc.com, for
    a free trial subscription to D S *


    --------------------------------------------------------------------------

    4. G6G's Directory of Intelligent Software has been Upgraded

    The "G6G Directory of Intelligent Software" for the last seven (7)
    consecutive weeks, the "Hot Link of the Week" at the TOP of
    http://library.microsoft.com

    G6G's Directory of Intelligent Software has been revised through the
    following:

    (1) The "Intelligent Software/Hardware" product abstracts and contact
    info have been fully updated.

    (2) It is now possible to recommend an Intelligent Software/Hardware
    product for inclusion into the "G6G Directory of Intelligent
    Software."

    (3) A list server has been installed so that visitors may join an
    Intelligent Software mailing list and receive periodic "What's
    New" info about http://www.intelligent-dir.com

    (4) An "Intelligent Software Forum" promoting Intelligent
    Software/Hardware chat and messages has been created.

    We encourage you to again visit The G6G Directory of Intelligent Software
    and explore the new features.

    _________________________________________
    The G6G Directory of Intelligent Software
    -----------------------------------------
    http://www.intelligent-dir.com
    -----------------------------------------
    G6G Consulting Group
    (310) 458-4187
    g6g@asset.com
    _________________________________________


    --------------------------------------------------------------------------


    5. Newsletter summary and unsubscribe instructions

    The Gordian Institute Newsletter is designed to provide quarterly news
    releases which include articles, schedule updates and new course
    announcements. The Gordian Institute specializes in the instruction of
    new software technologies through first-rate, hands-on intensive training
    courses in the fields of:
    - Data Mining and Pattern Recognition
    - Adaptive Machine Learning
    - Intelligent Decision Systems
    - Knowledge Engineering
    - Hybrid AI Techniques

    Gordian Institute's newsletter is not broadcast unsolicited. This
    newsletter is shared with those who have elected to be on Gordian's
    electronic update list, or supplied contact information to The Gordian
    Institute when requesting product information. If you wish not to receive
    future releases, simply send an empty reply with "remove" in the subject
    field.

    The parent company, American Heuristics Corporation (AHC) is a founding
    member of the West Virginia High Technology Consortium, with headquarters
    in Triadelphia, West Virginia. AHC is an advanced technology consulting
    company applying hybrid solutions to complex technical problems in
    business, industry and government. AHC may be found on the web at:
    http://www.heuristics.com.

    =========================================================================