Preface
About This Guide
Greenplum Database Documentation
Document Conventions
Text Conventions
Command Syntax Conventions
Naming Conventions and Acronyms
Contact Us
Section I: Introduction to Greenplum
1. About the Greenplum Architecture
About the Greenplum Master
About the Greenplum Segments
About the Greenplum Interconnect
About Redundancy and Failover in Greenplum Database
About Segment Mirroring
Segment Failover and Recovery
About Master Mirroring
About Interconnect Redundancy
About Parallel Data Loading
About Management and Monitoring
2. About Distributed Databases
Understanding How Data is Stored
Understanding Greenplum Distribution Policies
3. Summary of Greenplum Features
System Requirements
Feature Summary
Greenplum SQL Standard Conformance
Core SQL Conformance
SQL 1992 Conformance
SQL 1999 Conformance
SQL 2003 Conformance
SQL 2008 Conformance
Greenplum and PostgreSQL Compatibility
4. About Greenplum Query Processing
Understanding Query Planning and Dispatch
Understanding Greenplum Query Plans
Understanding Parallel Query Execution
Section II: Access Control and Security
5. Managing Roles and Privileges
Security Best Practices for Roles and Privileges
Creating New Roles (Users)
Altering Role Attributes
Creating Groups (Role Membership)
Managing Object Privileges
Simulating Row and Column Level Access Control
Encrypting Data
6. Configuring Client Authentication
Allowing Connections to Greenplum Database
Editing the pg_hba.conf File
Limiting Concurrent Connections
Encrypting Client/Server Connections
7. Accessing the Database
Establishing a Database Session
Supported Client Applications
Greenplum Database Client Applications
Connecting with psql
pgAdmin III for Greenplum Database
Installing pgAdmin III for Greenplum Database
Documentation for pgAdmin III for Greenplum Database
Performing Administrative Tasks with pgAdmin III
Database Application Interfaces
Third-Party Client Tools
Troubleshooting Connection Problems
8. Managing Workload and Resources
Overview of Greenplum Workload Management
How Resource Queues Work in Greenplum Database
How Resource Queue Limits Work
How Resource Queue Priorities Work
Types of Queries Evaluated for Resource Queues
Steps to Enable Workload Management
Configuring Workload Management
Creating Resource Queues
Creating Queues with an Active Query Limit
Creating Queues with a Query Planner Cost Limits
Allowing Queries to Run on Idle Systems
Allowing Small Queries to Bypass Queue Limits
Setting Priority Levels
Assigning Roles (Users) to a Resource Queue
Removing a Role from a Resource Queue
Modifying Resource Queues
Altering a Resource Queue
Dropping a Resource Queue
Checking Resource Queue Status
Viewing Queued Statements and Resource Queue Status
Viewing Resource Queue Statistics
Viewing the Roles Assigned to a Resource Queue
Viewing the Waiting Queries for a Resource Queue
Clearing a Waiting Statement From a Resource Queue
Viewing the Priority of Active Statements
Resetting the Priority of an Active Statement
Section III: Database Administration
9. Defining Database Objects
Creating and Managing Databases
About Template Databases
Creating a Database
Cloning a Database
Viewing the List of Databases
Altering a Database
Dropping a Database
Creating and Managing Tablespaces
Creating a Filespace
Creating a Tablespace
Using a Tablespace to Store Database Objects
Viewing Existing Tablespaces and Filespaces
Dropping Tablespaces and Filespaces
Creating and Managing Schemas
The Default ‘Public’ Schema
Creating a Schema
Schema Search Paths
Setting the Schema Search Path
Viewing the Current Schema
Dropping a Schema
System Schemas
Creating and Managing Tables
Creating a Table
Choosing Column Data Types
Setting Table and Column Constraints
Choosing the Table Distribution Policy
Choosing the Table Storage Model
Altering a Table
Altering Table Distribution
Altering the Table Storage Model
Dropping a Table
Partitioning Large Tables
Understanding Table Partitioning in Greenplum Database
Deciding on a Table Partitioning Strategy
Creating Partitioned Tables
Defining Date Range Table Partitions
Defining Numeric Range Table Partitions
Defining List Table Partitions
Defining Multi-level Partitions
Partitioning an Existing Table
Limitations of Partitioned Tables
Loading Partitioned Tables
Verifying Your Partition Strategy
Some Limitations of Selective Partition Scanning
Viewing Your Partition Design
Maintaining Partitioned Tables
Adding a New Partition
Renaming a Partition
Adding a Default Partition
Dropping a Partition
Truncating a Partition
Exchanging a Partition
Splitting a Partition
Modifying a Subpartition Template
Creating and Using Sequences
Creating a Sequence
Using a Sequence
Altering a Sequence
Dropping a Sequence
Using Indexes in Greenplum Database
Index Types
About Bitmap Indexes
Creating an Index
Examining Index Usage
Managing Indexes
Dropping an Index
Creating and Managing Views
Creating Views
Dropping Views
10. Managing Data
About Concurrency Control in Greenplum Database
Inserting New Rows
Updating Existing Rows
Deleting Rows
Truncating a Table
Working With Transactions
Transaction Isolation Levels
Vacuuming the Database
Configuring the Free Space Map
11. Querying Data
Defining Queries
SQL Lexicon
SQL Value Expressions
Column References
Positional Parameters
Subscripts
Field Selection
Operator Invocations
Function Calls
Aggregate Expressions
Window Expressions
Type Casts
Scalar Subqueries
Array Constructors
Row Constructors
Expression Evaluation Rules
Using Functions and Operators
Using Functions in Greenplum Database
User-Defined Functions
Built-in Functions and Operators
Window Functions
Advanced Analytic Functions
Advanced Analytic Function Examples
Query Profiling
Reading EXPLAIN Output
EXPLAIN Example
Reading EXPLAIN ANALYZE Output
EXPLAIN ANALYZE Example
What to Look for in a Query Plan
12. Loading and Unloading Data
About the Greenplum Database Loading Tools
About External Tables
About gpload
About COPY
Formatting Your Data Files
Formatting Rows
Formatting Columns
Representing NULL Values
Escaping
Escaping in Text Formatted Files
Escaping in CSV Formatted Files
Character Encoding
Loading Data into Greenplum Database
Creating and Using File-Based External Tables
Defining the External Table
Using the Greenplum Parallel File Server (gpfdist)
Creating and Using Web External Tables
Defining Command-Based Web External Tables
Defining URL-Based Web External Tables
Loading Data Using an External Table
Handling Load Errors
Defining an External Table with Single Row Error Isolation
Viewing Bad Rows in the Error Table
Loading Data with gpload
Loading Data with COPY
Running COPY in Single Row Error Isolation Mode
Data Loading Performance Tips
Unloading Data from Greenplum Database
Defining a File-Based Writable External Table
Defining a Command-Based Writable External Table
Unloading Data Using a Writable External Table
Unloading Data Using COPY
Section IV: System Administration
13. Starting and Stopping Greenplum
Overview
Starting Greenplum Database
Restarting Greenplum Database
Uploading Configuration File Changes Only
Starting the Master in Maintenance Mode
Stopping Greenplum Database
14. Configuring Your Greenplum System
About Greenplum Master, Global, and Local Parameters
Setting Configuration Parameters
Setting a Local Configuration Parameter
Setting a Global or Master Configuration Parameter
Setting Parameters at the System Level
Setting Parameters at the Database Level
Setting Parameters at the Role Level
Setting Parameters in a Session
Viewing Settings of Server Configuration Parameters
Configuration Parameter Categories
Connection and Authentication Parameters
Connection Parameters
Security and Authentication Parameters
System Resource Consumption Parameters
Memory Consumption Parameters
Free Space Map Parameters
OS Resource Parameters
Cost-Based Vacuum Delay Parameters
Query Tuning Parameters
Query Plan Operator Control Parameters
Query Planner Costing Parameters
Database Statistics Sampling Parameters
Sort Operator Configuration Parameters
Aggregate Operator Configuration Parameters
Join Operator Configuration Parameters
Other Query Planner Configuration Parameters
Error Reporting and Logging Parameters
Log Rotation
When to Log
What to Log
System Monitoring Parameters
SNMP Alerts
Email Alerts
Greenplum Performance Monitor Agent
Runtime Statistics Collection Parameters
Automatic Statistics Collection Parameters
Client Connection Default Parameters
Statement Behavior Parameters
Locale and Formatting Parameters
Other Client Default Parameters
Lock Management Parameters
Workoad Management Parameters
External Table Parameters
Append-Only Table Parameters
Database and Tablespace/Filespace Parameters
Past PostgreSQL Version Compatibility Parameters
Greenplum Array Configuration Parameters
Interconnect Configuration Parameters
Dispatch Configuration Parameters
Fault Operation Parameters
Distributed Transaction Management Parameters
Read-Only Parameters
15. Enabling High Availability Features
Overview of High Availability in Greenplum Database
Overview of Segment Mirroring
Overview of Master Mirroring
Overview of Fault Detection and Recovery
Enabling Mirroring in Greenplum Database
Enabling Segment Mirroring
Enabling Master Mirroring
Knowing When a Segment is Down
Enabling Alerts and Notifications
Checking for Failed Segments
Checking the Log Files
Recovering a Failed Segment
Recovering From Segment Failures
Recovering a Failed Master
Restoring Master Mirroring After a Recovery
16. Backing Up and Restoring Databases
Overview of Backup and Restore Operations
About Parallel Backups
About Non-Parallel Backups
About Parallel Restores
About Non-Parallel Restores
Backing Up a Database
Backing Up a Database with gp_dump
Automating Parallel Backups with gpcrondump
Restoring From Parallel Backup Files
Restoring a Database with gp_restore
Restoring a Database Using gpdbrestore
Restoring to a Different Greenplum System Configuration
17. Expanding a Greenplum System
Planning Greenplum System Expansion
System Expansion Overview
System Expansion Checklist
Planning New Hardware Platforms
Planning Initialization of New Segments
Planning Mirror Segments
Increasing Segments Per Host
About the Expansion Schema
Planning Table Redistribution
Managing Redistribution in Large-Scale Greenplum Systems
Redistributing Append-Only and Compressed Tables
Redistributing Tables with Primary Key Constraints
Redistributing Tables with User-Defined Data Types
Redistributing Partitioned Tables
Preparing and Adding Nodes
Adding New Nodes to the Trusted Host Environment
Verifying OS Settings
Validating Disk I/O and Memory Bandwidth
Integrating New Hardware into the System
Initializing New Segments
Creating an Input File for System Expansion
Creating an input file in Interactive Mode
Expansion Input File Format
Running gpexpand to Initialize New Segments
Rolling Back an Failed Expansion Setup
Redistributing Tables
Ranking Tables for Redistribution
Redistributing Tables Using gpexpand
Monitoring Table Redistribution
Viewing Expansion Status
Viewing Table Status
Removing the Expansion Schema
Monitoring a Greenplum System
Monitoring Database Activity and Performance
Creating the Performance Monitor Database
Creating the Performance Monitor Superuser Role
Configuring the postgresql.conf File
Monitoring System State
Enabling System Alerts and Notifications
Using SNMP with a Greenplum Database System
Enabling Email Notifications
Testing Email Notifications
Checking System State
Viewing Master and Segment Status and Configuration
Viewing Your Mirroring Configuration
Checking Disk Space Usage
Checking Sizing of Distributed Databases and Tables
Checking for Data Distribution Skew
Viewing a Table’s Distribution Key
Viewing Data Distribution
Checking for Query Processing Skew
Viewing Metadata Information about Database Objects
Viewing the Last Operation Performed
Viewing the Definition of an Object
Viewing the Database Server Log Files
Log File Format
Searching the Greenplum Database Server Log Files
Using gp_toolkit
19. Routine System Maintenance Tasks
Routine Vacuum and Analyze
Transaction ID Management
System Catalog Maintenance
Regular System Catalog Maintenance
Intensive System Catalog Maintenance
Vacuum and Analyze for Query Optimization
Routine Reindexing
Managing Greenplum Database Log Files
Database Server Log Files
Management Utility Log Files
Section V: Performance Tuning
20. Defining Database Performance
Understanding the Performance Factors
System Resources
Workload
Throughput
Contention
Optimization
Determining Acceptable Performance
Baseline Hardware Performance
Performance Benchmarks
21. Common Causes of Performance Issues
Identifying Hardware and Segment Failures
Managing Workload
Avoiding Contention
Maintaining Database Statistics
Identifying Statistics Problems in Query Plans
Tuning Statistics Collection
Optimizing Data Distribution
Optimizing Your Database Design
Greenplum Database Maximum Limits
22. Investigating a Performance Problem
Checking System State
Checking Database Activity
Checking for Active Sessions (Workload)
Checking for Locks (Contention)
Checking Query Status and System Utilization
Troubleshooting Problem Queries
Investigating Error Messages
Gathering Information for Greenplum Support
Section VI: Extending Greenplum Database
23 Using Greenplum MapReduce
About Greenplum MapReduce
The Basics of MapReduce
How Greenplum MapReduce Works
Programming Greenplum MapReduce
Defining Inputs
External File Inputs
Database Inputs
OS Executable Inputs
Defining Map Functions
Map Functions in PERL
Map Functions in Python
Defining Reduce Functions
Defining Outputs
Table Outputs
File Outputs
Defining Tasks
Putting Together a Complete MapReduce Specification
Submitting MapReduce Jobs for Execution
Creating the Languages in the Database
Installing Library Files (for Custom C Functions)
Troubleshooting Problems with MapReduce Jobs
Language Does Not Exist
Generic Python Iterator Error
Function Defined Using Wrong MODE
Section VII: References
A. SQL Command Reference
SQL Syntax Summary
ABORT
ALTER AGGREGATE
ALTER CONVERSION
ALTER DATABASE
ALTER DOMAIN
ALTER EXTERNAL TABLE
ALTER FILESPACE
ALTER FUNCTION
ALTER GROUP
ALTER INDEX
ALTER LANGUAGE
ALTER OPERATOR
ALTER OPERATOR CLASS
ALTER RESOURCE QUEUE
ALTER ROLE
ALTER SCHEMA
ALTER SEQUENCE
ALTER TABLE
ALTER TABLESPACE
ALTER TRIGGER
ALTER TYPE
ALTER USER
ANALYZE
BEGIN
CHECKPOINT
CLOSE
CLUSTER
COMMENT
COMMIT
COPY
CREATE AGGREGATE
CREATE CAST
CREATE CONVERSION
CREATE DATABASE
CREATE DOMAIN
CREATE EXTERNAL TABLE
CREATE FUNCTION
CREATE GROUP
CREATE INDEX
CREATE LANGUAGE
CREATE OPERATOR
CREATE OPERATOR CLASS
CREATE RESOURCE QUEUE
CREATE ROLE
CREATE RULE
CREATE SCHEMA
CREATE SEQUENCE
CREATE TABLE
CREATE TABLE AS
CREATE TABLESPACE
CREATE TRIGGER
CREATE TYPE
CREATE USER
CREATE VIEW
DEALLOCATE
DECLARE
DELETE
DROP AGGREGATE
DROP CAST
DROP CONVERSION
DROP DATABASE
DROP DOMAIN
DROP EXTERNAL TABLE
DROP FILESPACE
DROP FUNCTION
DROP GROUP
DROP INDEX
DROP LANGUAGE
DROP OPERATOR
DROP OPERATOR CLASS
DROP OWNED
DROP RESOURCE QUEUE
DROP ROLE
DROP RULE
DROP SCHEMA
DROP SEQUENCE
DROP TABLE
DROP TABLESPACE
DROP TRIGGER
DROP TYPE
DROP USER
DROP VIEW
END
EXECUTE
EXPLAIN
FETCH
GRANT
INSERT
LOAD
LOCK
MOVE
PREPARE
REASSIGN OWNED
REINDEX
RELEASE SAVEPOINT
RESET
REVOKE
ROLLBACK
ROLLBACK TO SAVEPOINT
SAVEPOINT
SELECT
SELECT INTO
SET
SET ROLE
SET SESSION AUTHORIZATION
SET TRANSACTION
SHOW
START TRANSACTION
TRUNCATE
UPDATE
VACUUM
VALUES
B. Management Utility Reference
Backend Server Programs
Management Utility Summary
gp_dump
gp_restore
gpaddmirrors
gpactivatestandby
gpbitmapreindex
gpcheckos
gpcheckperf
gpconfig
gpcrondump
gpdbrestore
gpdeletesystem
gpdetective
gpexpand
gpfdist
gpfilespace
gpinitstandby
gpinitsystem
gpload
gplogfilter
gpmapreduce
gpmigrator
gprecoverseg
gpscp
gpsizecalc
gpsnmpd
gpssh
gpssh-exkeys
gpstart
gpstate
gpstop
gpsys1
C. Client Utility Reference
Client Utility Summary
clusterdb
createdb
createlang
createuser
dropdb
droplang
dropuser
ecpg
pg_config
pg_dump
pg_dumpall
pg_restore
psql
reindexdb
vacuumdb
D. Server Configuration Parameters
add_missing_from
array_nulls
authentication_timeout
backslash_quote
block_size
bonjour_name
check_function_bodies
client_encoding
client_min_messages
cpu_index_tuple_cost
cpu_operator_cost
cpu_tuple_cost
cursor_tuple_fraction
custom_variable_classes
DateStyle
db_user_namespace
deadlock_timeout
debug_assertions
debug_pretty_print
debug_print_parse
debug_print_plan
debug_print_prelim_plan
debug_print_rewritten
debug_print_slice_table
default_statistics_target
default_tablespace
default_transaction_isolation
default_transaction_read_only
dynamic_library_path
effective_cache_size
enable_bitmapscan
enable_groupagg
enable_hashagg
enable_hashjoin
enable_indexscan
enable_mergejoin
enable_nestloop
enable_seqscan
enable_sort
enable_tidscan
escape_string_warning
explain_pretty_print
extra_float_digits
from_collapse_limit
gp_adjust_selectivity_for_outerjoins
gp_analyze_relative_error
gp_autostats_mode
gp_autostats_on_change_threshold
gp_cached_segworkers_threshold
gp_command_count
gp_connections_per_thread
gp_content
gp_dbid
gp_debug_linger
gp_email_from
gp_email_smtp_password
gp_email_smtp_server
gp_email_smtp_userid
gp_email_to
gp_enable_adaptive_nestloop
gp_enable_agg_distinct
gp_enable_agg_distinct_pruning
gp_enable_direct_dispatch
gp_enable_fallback_plan
gp_enable_fast_sri
gp_enable_gpperfmon
gp_enable_groupext_distinct_gather
gp_enable_groupext_distinct_pruning
gp_enable_multiphase_agg
gp_enable_predicate_propagation
gp_enable_preunique
gp_enable_sequential_window_plans
gp_enable_sort_distinct
gp_enable_sort_limit
gp_external_enable_exec
gp_external_grant_privileges
gp_external_max_segs
gp_fts_probe_interval
gp_fts_probe_threadcount
gp_gpperfmon_send_interval
gp_hashjoin_tuples_per_bucket
gp_interconnect_hash_multiplier
gp_interconnect_queue_depth
gp_interconnect_setup_timeout
gp_interconnect_type
gp_log_format
gp_max_csv_line_length
gp_max_databases
gp_max_filespaces
gp_max_local_distributed_cache
gp_max_packet_size
gp_max_tablespaces
gp_motion_cost_per_row
gp_num_contents_in_cluster
gp_reject_percent_threshold
gp_reraise_signal
gp_resqueue_priority
gp_resqueue_priority_cpucores_per_s egment
gp_resqueue_priority_sweeper_interva l
gp_role
gp_safefswritesize
gp_segment_connect_timeout
gp_segments_for_planner
gp_segworker_relative_priority
gp_session_id
gp_set_proc_affinity
gp_set_read_only
gp_snmp_community
gp_snmp_monitor_address
gp_snmp_use_inform_or_trap
gp_statistics_pullup_from_child_ partition
gp_statistics_use_fkeys
gp_vmem_idle_resource_timeout
gp_vmem_protect_limit
gp_vmem_protect_segworker_cache_li mit
gp_workfile_compress_algorithm
gpperfmon_port
integer_datetimes
IntervalStyle
join_collapse_limit
krb_caseins_users
krb_server_keyfile
krb_srvname
lc_collate
lc_ctype
lc_messages
lc_monetary
lc_numeric
lc_time
listen_addresses
local_preload_libraries
log_autostats
log_connections
log_disconnections
log_dispatch_stats
log_duration
log_error_verbosity
log_executor_stats
log_hostname
log_min_duration_statement
log_min_error_statement
log_min_messages
log_parser_stats
log_planner_stats
log_rotation_age
log_rotation_size
log_statement
log_statement_stats
log_timezone
log_truncate_on_rotation
maintenance_work_mem
max_appendonly_tables
max_connections
max_files_per_process
max_fsm_pages
max_fsm_relations
max_function_args
max_identifier_length
max_index_keys
max_locks_per_transaction
max_prepared_transactions
max_resource_portals_per_transaction
max_resource_queues
max_stack_depth
max_work_mem
password_encryption
port
random_page_cost
regex_flavor
resource_cleanup_gangs_on_wait
resource_select_only
search_path
seq_page_cost
server_encoding
server_version
server_version_num
shared_buffers
shared_preload_libraries
ssl
ssl_ciphers
standard_conforming_strings
statement_timeout
stats_queue_level
superuser_reserved_connections
tcp_keepalives_count
tcp_keepalives_idle
tcp_keepalives_interval
temp_buffers
TimeZone
timezone_abbreviations
track_activities
track_counts
transaction_isolation
transaction_read_only
transform_null_equals
unix_socket_directory
unix_socket_group
unix_socket_permissions
update_process_title
vacuum_cost_delay
vacuum_cost_limit
vacuum_cost_page_dirty
vacuum_cost_page_hit
vacuum_cost_page_miss
vacuum_freeze_min_age
work_mem
E. Greenplum MapReduce Specification
Greenplum MapReduce Document Format
Greenplum MapReduce Document Schema
Example Greenplum MapReduce Document
MapReduce Flow Diagram
F. Greenplum Environment Variables
Required Environment Variables
Optional Environment Variables
G. Greenplum Database Data Types
H. System Catalog Reference
I. The gp_toolkit Administrative Schema
gp_toolkit Views
gp_bloat_diag
gp_bloat_expected_pages
gp_log_command_timings
gp_log_database
gp_log_master_concise
gp_log_system
gp_resq_activity
gp_resq_activity_by_queue
gp_skew_coefficients
gp_skew_idle_fractions
gp_statement_priorities
gp_stats_missing
gp_toolkit Functions
gp_param_setting('parameter_name')
Example:
gp_toolkit_version()
gp_toolkit External Tables
gp_disk_free
J. Character Set Support
Setting the Character Set
Character Set Conversion Between Server and Client
K. SQL 2008 Optional Feature Compliance
Glossary
Index
Symbols
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W