logo资料库

TIBCO Spotfire User Guide.pdf

第1页 / 共756页
第2页 / 共756页
第3页 / 共756页
第4页 / 共756页
第5页 / 共756页
第6页 / 共756页
第7页 / 共756页
第8页 / 共756页
资料共756页,剩余部分请下载后查看
Important Information
Reference
Technical Support
Introduction
Welcome to TIBCO Spotfire Miner™ 8.2
Key Features and Benefits of Spotfire Miner
System Requirements and Installation
How Spotfire Miner Does Data Mining
Define Goals
Access Data
Explore Data
Select and Transform Variables
Model Data
Deploy Model
The Spotfire S+ Library
Help, Support, and Learning Resources
Online Help
Online Manuals
Data Mining References
Typographic Conventions
Data Input and Output
Overview
Data Types in Spotfire Miner™
Categorical Data
Strings
Reading/Writing Data Sets with Long Column Names
Reading Long Strings
Dates
Worksheet Options for Dates
Date Parsing Formats
Date Display Formats
Limitations
Working with External Files
Reading External Files and Databases
Using Absolute and Relative Paths
Data Input
Read Text File
General Procedure
Properties
Using the Viewer
Read Fixed Format Text File
General Procedure
Properties
Using the Viewer
Read Spotfire Data
General Procedure
Properties
Using the Viewer
Read SAS File
General Procedure
Properties
Using the Viewer
Read Excel File
General Procedure
Properties
Using the Viewer
Read Other File
General Procedure
Properties
Using the Viewer
Read Database ODBC
The ODBC Data Source Administrator
ODBC Drivers
Defining a Data Source
General Procedure
Properties
Using the Viewer
Read DB2 Native
DB2 Client
General Procedure
Properties
Using the Viewer
Read Oracle Native
Oracle Client
General Procedure
Properties
Using the Viewer
Read SQL Native
Microsoft SQL Server Client
General Procedure
Properties
Using the Viewer
Read Sybase Native
Sybase Client
General Procedure
Properties
Using the Viewer
Read Database JDBC
Data Output
Write Text File
General Procedure
Properties
Using the Viewer
Write Fixed Format Text File
General Procedure
Properties
Using the Viewer
Write SAS File
General Procedure
Properties
Using the Viewer
Write Spotfire Data
General Procedure
Properties
Using the Viewer
Write Excel File
General Procedure
Properties
Using the Viewer
Write Other File
General Procedure
Properties
Using the Viewer
Write Database ODBC
General Procedure
Properties
Using the Viewer
Write DB2 Native
General Procedure
Properties
Using the Viewer
Write Oracle Native
General Procedure
Properties
Using the Viewer
Write SQL Native
General Procedure
Properties
Using the Viewer
Write Sybase Native
General Procedure
Properties
Using the Viewer
Write Database JDBC
The TIBCO Spotfire Miner™ Interface
Overview
The Main Menu
File Menu Options
Edit Menu Options
View Menu Options
Tools Menu Options
Window Menu Options
Help Menu Options
The Toolbar
The Explorer Pane
The Main Library
The Spotfire S+ Library
The User Library
Copying Nodes To Libraries
Deleting Library Components
Library Manager
Other Library Operations
The Desktop Pane
The Message Pane
The Command Line Pane
The Spotfire Miner™ Working Environment
Worksheet Directories
The Examples Folder
Building and Editing Networks
Building a Network
Adding Nodes
Navigating in a Worksheet
Annotations
Deleting Nodes
Linking Nodes
Deleting Links
Viewing The Data In Links
Link Line Style
Copying Nodes
Model Ports
Specifying Properties for Nodes
Specifying File Names
Collapsing Nodes
Creating Customized Components
Running and Stopping a Network
Running Nodes and Networks
Node Priority
Status Indicators
Data Caches
Invalidating Nodes
Stopping a Running Network
Common Features of Network Nodes
Shortcut Menus
Properties Dialogs
Opening the Properties Dialog
Sorting in Dialog Fields
Visual Cues in Dialog Fields
Advanced Page
Viewers
Launching a Viewer
Closing Viewers
The Table Viewer
Data Exploration
Overview
Creating One-Dimensional Charts
General Procedure
Chart Types
Pie Charts
Bar Charts
Column Charts
Dot Charts
Histograms
Box Plots
The Order of Levels in Categorical Variables
Properties
The Properties Page
The Options Page
Conditioned Charts
Using the Viewer
Selecting Charts
Viewing Charts
Enlarging Charts
Formatting Charts
Saving, Printing, and Copying Charts
An Example
Computing Correlations and Covariances
General Procedure
Definitions
Properties
Using the Viewer
Output
An Example
Crosstabulating Categorical Data
General Procedure
Properties
Using the Viewer
An Example
Computing Descriptive Statistics
General Procedure
Properties
Using the Viewer
Comparing Data
General Procedure
Properties
The Properties Page
The Output Page
Using the Viewer
Viewing Tables
General Procedure
Using the Viewer
Data Cleaning
Overview
Missing Values
General Procedure
Properties
The Properties Page
Using the Viewer
An Example
Duplicate Detection
General Procedure
Background
Properties
The Properties Page
The Output Page
Using the Viewer
An Example
Outlier Detection
General Procedure
Background
Properties
The Properties Page
The Output Page
Using the Viewer
An Example
Interpreting the Results
Technical Details
Why Robust Distances Are Preferable
Algorithm Specifics
References
Data Manipulation
Overview
Manipulating Rows
Aggregate
General Procedure
Properties
Using the Viewer
Append
General Procedure
Properties
Using the Viewer
Filter Rows
General Procedure
Properties
Using the Viewer
Partition
General Procedure
Properties
Using the Viewer
Sample
General Procedure
Properties
Using the Viewer
Shuffle
General Procedure
Using the Viewer
Sort
General Procedure
Properties
Using the Viewer
Split
General Procedure
Properties
Using the Viewer
Stack
General Procedure
Properties
Using the Viewer
Unstack
General Procedure
Properties
Using the Viewer
Manipulating Columns
Bin
General Procedure
Properties
Vary By Column
Using the Viewer
Create Columns
General Procedure
Properties
Using the Viewer
Filter Columns
General Procedure
Properties
Using the Viewer
Recode Columns
General Procedure
Properties
Using the Viewer
Example
Join
General Procedure
Properties
Using the Viewer
Modify Columns
General Procedure
Properties
Using the Viewer
Normalize
General Procedure
Properties
Using the Viewer
Reorder Columns
General Procedure
Properties
Using the Viewer
Transpose
General Procedure
Properties
Using the Viewer
Using the Spotfire Miner™ Expression Language
Value Types
NA Handling
Error Handling
Column References
Double and String Constants
Operators
Functions
Conversion Functions
Numeric Functions
String Functions
Date Manipulation Functions
Data Set Functions
Miscellaneous Functions
Classification Models
Overview
General Procedure
Selecting Dependent and Independent Variables
Sorting Column Names
Selecting Output
Creating Predict Nodes
Logistic Regression Models
Mathematical Definitions
Properties
The Properties Page
The Options Page
The Output Page
Using the Viewer
Creating a Filter Column node
A Cross-Sell Example
Importing and Exploring the Data
Manipulating the Data
Modeling the Data
Predicting from the Model
Technical Details
Classification Trees
Background
Growing a Tree
Pruning a Tree
Ensemble Trees
Trees in Spotfire Miner
Properties
The Properties Page
The Options Page
The Single Tree Page
The Ensemble Page
The Output Page
The Advanced Page
Using the Viewer
A Cross-Sell Example (Continued)
Importing, Exploring, and Manipulating the Data
Modeling the Data
Predicting from the Model
Classification Neural Networks
Background
Properties
The Properties Page
The Options Page
The Output Page
Using the Viewer
A Cross-Sell Example (Continued)
Importing, Exploring, and Manipulating the Data
Modeling the Data
Predicting from the Model
Technical Details
Learning Algorithms
Initialization of Weights
Naive Bayes Models
Background
Properties
The Properties Page
The Output Page
Using the Viewer
A Promoter Gene Sequence Example
Technical Details
References
Regression Models
Overview
General Procedure
Selecting Dependent and Independent Variables
Sorting Column Names
Selecting Output
Creating Predict Nodes
Linear Regression Models
Mathematical Definitions
Properties
The Properties Page
The Output Page
Using the Viewer
Creating a Filter Column node
A House Pricing Example
Importing and Exploring the Data
Manipulating the Data
Exploring and Manipulating the Data Again
Modeling the Data
Technical Details
Algorithm Specifics
The Coding of Levels in Categorical Variables
Regression Trees
Background
Growing a Tree
Ensemble Trees
Trees in Spotfire Miner
Properties
The Properties Page
The Options Page
The Single Tree Page
The Ensemble Page
The Output Page
The Advanced Page
Using the Viewer
A House Pricing Example (Continued)
Regression Neural Networks
Background
Properties
The Properties Page
The Options Page
The Output Page
Using the Viewer
A House Pricing Example (Continued)
Technical Details
Learning Algorithms
Initialization of Weights
References
Clustering
Overview
The K-Means Component
General Procedure
Properties
Properties Page
Options Page
Output Page
Tips for Better Cluster Results
Technical Details
Scalable K-Means Algorithm
Coding of Categorical Variables
Example
K-Means Clustering Example
References
Dimension Reduction
Overview
Principal Components
General Procedure
Properties
The Properties Page
The Output Page
Using the Viewer
An Example Using Principal Components
Technical Details
Association Rules
Overview
Association Rules Node Options
Properties Page
Options Page
Output Page
Definitions
Support
Confidence
Lift
Data Input Types
Groceries Example
Setting the Association Rules
Survival
Introduction
Basic Survival Models Background
General Procedure
Properties
The Properties Page
Time Varying Covariates
The Options Page
The Output Page
Using the Viewer
A Banking Customer Churn Example
A Time Varying Covariates Example
Technical Details for Cox Regression Models
Mathematical Definitions
Computational Details
Time-Dependent Covariates
Tied Events
Strata
Survival Function
References
Model Assessment
Overview
Properties
Assessing Classification Models
General Procedure
Classification Agreement
Confusion Matrices
Using the Viewer
Lift Chart
Chart Types
Assessing Regression Models
General Procedure
Definitions
Using the Viewer
Deploying Models
Overview
Predictive Modeling Markup Language
PMML Conformance
Import/Export Compatibility
Export PMML
General Procedure
Properties
Using the Viewer
Import PMML
General Procedure
Properties
Using the Viewer
Export Report
General Procedure
Properties
Transform
Using the Viewer
Advanced Topics
Overview
Pipeline Architecture
The Advanced Page
Worksheet Advanced Options
Max Rows Per Block
Max Megabytes Per Block
Order of Operations
Caching
Random Seed
Worksheet Random Seeds Option
Notes on Data Blocks and Caching
Deleting Data Caches
Worksheet Data Directories
Memory Intensive Functions
Size Recommendations for Spotfire Miner™
Worst-case Scenario Assumptions
Upper Limit Estimation for .wsd Disk Space
Command Line Options
Running Spotfire Miner in Batch
Increasing Java Memory
Importing and Exporting Data with JDBC
JDBC Example Workflow
The S-PLUS Library
Overview
S-PLUS Data Nodes
Read S-PLUS Data
General Procedure
Properties
Using the Viewer
Write S-PLUS Data
General Procedure
Properties
Using the Viewer
S-PLUS Chart Nodes
Overview
General Procedure
Using the Graph Window
Graph Options
One Column - Continuous
Data Page
Density Plot
Histogram
QQ Math Plot
One Column - Categorical
Data Page
Bar Chart
Dot Plot
Pie Chart
Two Columns - Continuous
Data Page
Hexbin Plot
Scatter Plot
Two Columns - Mixed
Data Page
Box Plot
Strip Plot
QQ Plot
Three Columns
Data Page
Contour Plot
Level Plot
Surface Plot
Cloud Plot
Multiple Columns
Multiple 2-D Plots
Data Page
Hexbin Matrix
Scatterplot Matrix
Parallel Plot
Time Series
Time Series Line Plot
Time Series High- Low Plot
Time Series Stacked Bar Plot
Common Pages
Titles Page
Axes Page
Multipanel Page
File Page
Advanced Page
S-PLUS Data Manipulation Nodes
Evaluating S-PLUS Expressions
Data Types in Spotfire Miner and Spotfire S+
Spotfire S+ Column Names
S-PLUS Create Columns
General Procedure
Properties
Using the Viewer
S-PLUS Filter Rows
General Procedure
Properties
Using the Viewer
S-PLUS Split
General Procedure
Properties
Using the Viewer
S-PLUS Script Node
General Procedure
Properties
The Properties Page
The Options Page
The Parameters Page
Processing Multiple Data Blocks
The Test Phase
Input List Elements
Output List Elements
Size of the Input Data Frames
Date and String Values
Interpreting min/max values
Debugging
Processing Data Using the Execute Big Data Script Option
Reading and Writing bdFrames
Passing Other Object Types using bdPackedObjects
Loading Spotfire S+ Modules
Examples Using the S-PLUS Script Node
Create Plots
Fit and Use a Generalized Additive Model
Passing Model Information to Prediction Nodes
Replace Missing Values
Use a Custom Library from Spotfire S+
Access Data from a Spotfire S+ Database
Filter Columns Using Dynamic Outputs
An Extended Example with Two S-PLUS Script Nodes
References
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Z
TIBCO Spotfire Miner™ 8.2 User’s Guide November 2010 TIBCO Software Inc.
IMPORTANT INFORMATION SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO SOFTWARE IS SOLELY TO ENABLE THE FUNCTIONALITY (OR PROVIDE LIMITED ADD-ON FUNCTIONALITY) OF THE LICENSED TIBCO SOFTWARE. THE EMBEDDED OR BUNDLED SOFTWARE IS NOT LICENSED TO BE USED OR ACCESSED BY ANY OTHER TIBCO SOFTWARE OR FOR ANY OTHER PURPOSE. USE OF TIBCO SOFTWARE AND THIS DOCUMENT IS SUBJECT TO THE TERMS AND CONDITIONS OF A LICENSE AGREEMENT FOUND IN EITHER A SEPARATELY EXECUTED SOFTWARE LICENSE AGREEMENT, OR, IF THERE IS NO SUCH SEPARATE AGREEMENT, THE CLICKWRAP END USER LICENSE AGREEMENT WHICH IS DISPLAYED DURING DOWNLOAD OR INSTALLATION OF THE SOFTWARE (AND WHICH IS DUPLICATED IN THE TIBCO SPOTFIRE MINER LICENSES). USE OF THIS DOCUMENT IS SUBJECT TO THOSE TERMS AND CONDITIONS, AND YOUR USE HEREOF SHALL CONSTITUTE ACCEPTANCE OF AND AN AGREEMENT TO BE BOUND BY THE SAME. This document contains confidential information that is subject to U.S. and international copyright laws and treaties. No part of this document may be reproduced in any form without the written authorization of TIBCO Software Inc. TIBCO Software Inc., TIBCO, Spotfire, TIBCO Spotfire Miner, TIBCO Spotfire S+, Insightful, the Insightful logo, the tagline "the Knowledge to Act," Insightful Miner, S+, S-PLUS, TIBCO Spotfire Axum, S+ArrayAnalyzer, S+EnvironmentalStats, S+FinMetrics, S+NuOpt, S+SeqTrial, S+SpatialStats, S+Wavelets, S-PLUS Graphlets, Graphlet, Spotfire S+ FlexBayes, Spotfire S+ Resample, TIBCO Spotfire S+ Server, TIBCO Spotfire Statistics Services, and TIBCO Spotfire Clinical Graphics are either registered trademarks or trademarks of TIBCO Software Inc. and/or subsidiaries of TIBCO Software Inc. in the United States and/or other countries. All other product and company names and marks mentioned in this document are the property of their respective owners and are mentioned for identification purposes only. This software may be available on ii
multiple operating systems. However, not all operating system platforms for a specific software version are released at the same time. Please see the readme.txt file for the availability of this software version on a specific operating system platform. THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. THIS DOCUMENT COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN; THESE CHANGES WILL BE INCORPORATED IN NEW EDITIONS OF THIS DOCUMENT. TIBCO SOFTWARE INC. MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED IN THIS DOCUMENT AT ANY TIME. Copyright © 1996-2010 TIBCO Software Inc. ALL RIGHTS RESERVED. THE CONTENTS OF THIS DOCUMENT MAY BE MODIFIED AND/OR QUALIFIED, DIRECTLY OR INDIRECTLY, BY OTHER DOCUMENTATION WHICH ACCOMPANIES THIS SOFTWARE, INCLUDING BUT NOT LIMITED TO ANY RELEASE NOTES AND "READ ME" FILES. TIBCO Software Inc. Confidential Information Reference Technical Support The correct bibliographic reference for this document is as follows: TIBCO Spotfire Miner™ 8.2 User’s Guide, TIBCO Software Inc. For technical support, please visit http://spotfire.tibco.com/support and register for a support account. iii
iv
CONTENTS Important Information Chapter 1 Introduction Welcome to TIBCO Spotfire Miner™ 8.2 System Requirements and Installation How Spotfire Miner Does Data Mining Help, Support, and Learning Resources Typographic Conventions Chapter 2 Data Input and Output Overview Data Types in Spotfire Miner™ Working with External Files Data Input Data Output Chapter 3 The TIBCO Spotfire Miner™ Interface Overview The Spotfire Miner™ Working Environment Building and Editing Networks Common Features of Network Nodes ii 1 2 4 6 18 20 21 23 24 34 35 74 101 102 128 130 139 v
Contents vi Chapter 4 Data Exploration Overview Creating One-Dimensional Charts Computing Correlations and Covariances Crosstabulating Categorical Data Computing Descriptive Statistics Comparing Data Viewing Tables Chapter 5 Data Cleaning Overview Missing Values Duplicate Detection Outlier Detection Technical Details References Chapter 6 Data Manipulation Overview Manipulating Rows Manipulating Columns Using the Spotfire Miner™ Expression Language Chapter 7 Classification Models Overview Logistic Regression Models Classification Trees Classification Neural Networks Naive Bayes Models References 151 153 154 170 176 181 184 188 191 192 194 200 208 218 223 225 227 228 253 285 309 311 319 344 362 383 393
Chapter 8 Regression Models Overview Linear Regression Models Regression Trees Regression Neural Networks References Chapter 9 Clustering Overview The K-Means Component Technical Details K-Means Clustering Example References Chapter 10 Dimension Reduction Overview Principal Components An Example Using Principal Components Technical Details Chapter 11 Association Rules Overview Association Rules Node Options Definitions Data Input Types Groceries Example Chapter 12 Survival Introduction Basic Survival Models Background Contents 395 396 404 426 441 456 457 458 461 468 471 481 483 484 485 490 493 495 496 497 501 503 505 511 512 513 vii
Contents viii A Banking Customer Churn Example A Time Varying Covariates Example Technical Details for Cox Regression Models References Chapter 13 Model Assessment Overview Assessing Classification Models Assessing Regression Models Chapter 14 Deploying Models Overview Predictive Modeling Markup Language Export Report Chapter 15 Advanced Topics Overview Pipeline Architecture The Advanced Page Notes on Data Blocks and Caching Memory Intensive Functions Size Recommendations for Spotfire Miner™ Command Line Options Increasing Java Memory Importing and Exporting Data with JDBC Chapter 16 The S-PLUS Library Overview S-PLUS Data Nodes S-PLUS Chart Nodes 524 527 529 533 535 536 540 546 549 550 551 556 561 562 563 564 568 573 575 578 580 581 587 589 592 597
分享到:
收藏