logo资料库

greenplum.pdf

第1页 / 共987页
第2页 / 共987页
第3页 / 共987页
第4页 / 共987页
第5页 / 共987页
第6页 / 共987页
第7页 / 共987页
第8页 / 共987页
资料共987页,剩余部分请下载后查看
Preface
About This Guide
Greenplum Database Documentation
Document Conventions
Text Conventions
Command Syntax Conventions
Naming Conventions and Acronyms
Contact Us
Section I: Introduction to Greenplum
1. About the Greenplum Architecture
About the Greenplum Master
About the Greenplum Segments
About the Greenplum Interconnect
About Redundancy and Failover in Greenplum Database
About Segment Mirroring
Segment Failover and Recovery
About Master Mirroring
About Interconnect Redundancy
About Parallel Data Loading
About Management and Monitoring
2. About Distributed Databases
Understanding How Data is Stored
Understanding Greenplum Distribution Policies
3. Summary of Greenplum Features
System Requirements
Feature Summary
Greenplum SQL Standard Conformance
Core SQL Conformance
SQL 1992 Conformance
SQL 1999 Conformance
SQL 2003 Conformance
SQL 2008 Conformance
Greenplum and PostgreSQL Compatibility
4. About Greenplum Query Processing
Understanding Query Planning and Dispatch
Understanding Greenplum Query Plans
Understanding Parallel Query Execution
Section II: Access Control and Security
5. Managing Roles and Privileges
Security Best Practices for Roles and Privileges
Creating New Roles (Users)
Altering Role Attributes
Creating Groups (Role Membership)
Managing Object Privileges
Simulating Row and Column Level Access Control
Encrypting Data
6. Configuring Client Authentication
Allowing Connections to Greenplum Database
Editing the pg_hba.conf File
Limiting Concurrent Connections
Encrypting Client/Server Connections
7. Accessing the Database
Establishing a Database Session
Supported Client Applications
Greenplum Database Client Applications
Connecting with psql
pgAdmin III for Greenplum Database
Installing pgAdmin III for Greenplum Database
Documentation for pgAdmin III for Greenplum Database
Performing Administrative Tasks with pgAdmin III
Database Application Interfaces
Third-Party Client Tools
Troubleshooting Connection Problems
8. Managing Workload and Resources
Overview of Greenplum Workload Management
How Resource Queues Work in Greenplum Database
How Resource Queue Limits Work
How Resource Queue Priorities Work
Types of Queries Evaluated for Resource Queues
Steps to Enable Workload Management
Configuring Workload Management
Creating Resource Queues
Creating Queues with an Active Query Limit
Creating Queues with a Query Planner Cost Limits
Allowing Queries to Run on Idle Systems
Allowing Small Queries to Bypass Queue Limits
Setting Priority Levels
Assigning Roles (Users) to a Resource Queue
Removing a Role from a Resource Queue
Modifying Resource Queues
Altering a Resource Queue
Dropping a Resource Queue
Checking Resource Queue Status
Viewing Queued Statements and Resource Queue Status
Viewing Resource Queue Statistics
Viewing the Roles Assigned to a Resource Queue
Viewing the Waiting Queries for a Resource Queue
Clearing a Waiting Statement From a Resource Queue
Viewing the Priority of Active Statements
Resetting the Priority of an Active Statement
Section III: Database Administration
9. Defining Database Objects
Creating and Managing Databases
About Template Databases
Creating a Database
Cloning a Database
Viewing the List of Databases
Altering a Database
Dropping a Database
Creating and Managing Tablespaces
Creating a Filespace
Creating a Tablespace
Using a Tablespace to Store Database Objects
Viewing Existing Tablespaces and Filespaces
Dropping Tablespaces and Filespaces
Creating and Managing Schemas
The Default ‘Public’ Schema
Creating a Schema
Schema Search Paths
Setting the Schema Search Path
Viewing the Current Schema
Dropping a Schema
System Schemas
Creating and Managing Tables
Creating a Table
Choosing Column Data Types
Setting Table and Column Constraints
Choosing the Table Distribution Policy
Choosing the Table Storage Model
Altering a Table
Altering Table Distribution
Altering the Table Storage Model
Dropping a Table
Partitioning Large Tables
Understanding Table Partitioning in Greenplum Database
Deciding on a Table Partitioning Strategy
Creating Partitioned Tables
Defining Date Range Table Partitions
Defining Numeric Range Table Partitions
Defining List Table Partitions
Defining Multi-level Partitions
Partitioning an Existing Table
Limitations of Partitioned Tables
Loading Partitioned Tables
Verifying Your Partition Strategy
Some Limitations of Selective Partition Scanning
Viewing Your Partition Design
Maintaining Partitioned Tables
Adding a New Partition
Renaming a Partition
Adding a Default Partition
Dropping a Partition
Truncating a Partition
Exchanging a Partition
Splitting a Partition
Modifying a Subpartition Template
Creating and Using Sequences
Creating a Sequence
Using a Sequence
Altering a Sequence
Dropping a Sequence
Using Indexes in Greenplum Database
Index Types
About Bitmap Indexes
Creating an Index
Examining Index Usage
Managing Indexes
Dropping an Index
Creating and Managing Views
Creating Views
Dropping Views
10. Managing Data
About Concurrency Control in Greenplum Database
Inserting New Rows
Updating Existing Rows
Deleting Rows
Truncating a Table
Working With Transactions
Transaction Isolation Levels
Vacuuming the Database
Configuring the Free Space Map
11. Querying Data
Defining Queries
SQL Lexicon
SQL Value Expressions
Column References
Positional Parameters
Subscripts
Field Selection
Operator Invocations
Function Calls
Aggregate Expressions
Window Expressions
Type Casts
Scalar Subqueries
Array Constructors
Row Constructors
Expression Evaluation Rules
Using Functions and Operators
Using Functions in Greenplum Database
User-Defined Functions
Built-in Functions and Operators
Window Functions
Advanced Analytic Functions
Advanced Analytic Function Examples
Query Profiling
Reading EXPLAIN Output
EXPLAIN Example
Reading EXPLAIN ANALYZE Output
EXPLAIN ANALYZE Example
What to Look for in a Query Plan
12. Loading and Unloading Data
About the Greenplum Database Loading Tools
About External Tables
About gpload
About COPY
Formatting Your Data Files
Formatting Rows
Formatting Columns
Representing NULL Values
Escaping
Escaping in Text Formatted Files
Escaping in CSV Formatted Files
Character Encoding
Loading Data into Greenplum Database
Creating and Using File-Based External Tables
Defining the External Table
Using the Greenplum Parallel File Server (gpfdist)
Creating and Using Web External Tables
Defining Command-Based Web External Tables
Defining URL-Based Web External Tables
Loading Data Using an External Table
Handling Load Errors
Defining an External Table with Single Row Error Isolation
Viewing Bad Rows in the Error Table
Loading Data with gpload
Loading Data with COPY
Running COPY in Single Row Error Isolation Mode
Data Loading Performance Tips
Unloading Data from Greenplum Database
Defining a File-Based Writable External Table
Defining a Command-Based Writable External Table
Unloading Data Using a Writable External Table
Unloading Data Using COPY
Section IV: System Administration
13. Starting and Stopping Greenplum
Overview
Starting Greenplum Database
Restarting Greenplum Database
Uploading Configuration File Changes Only
Starting the Master in Maintenance Mode
Stopping Greenplum Database
14. Configuring Your Greenplum System
About Greenplum Master, Global, and Local Parameters
Setting Configuration Parameters
Setting a Local Configuration Parameter
Setting a Global or Master Configuration Parameter
Setting Parameters at the System Level
Setting Parameters at the Database Level
Setting Parameters at the Role Level
Setting Parameters in a Session
Viewing Settings of Server Configuration Parameters
Configuration Parameter Categories
Connection and Authentication Parameters
Connection Parameters
Security and Authentication Parameters
System Resource Consumption Parameters
Memory Consumption Parameters
Free Space Map Parameters
OS Resource Parameters
Cost-Based Vacuum Delay Parameters
Query Tuning Parameters
Query Plan Operator Control Parameters
Query Planner Costing Parameters
Database Statistics Sampling Parameters
Sort Operator Configuration Parameters
Aggregate Operator Configuration Parameters
Join Operator Configuration Parameters
Other Query Planner Configuration Parameters
Error Reporting and Logging Parameters
Log Rotation
When to Log
What to Log
System Monitoring Parameters
SNMP Alerts
Email Alerts
Greenplum Performance Monitor Agent
Runtime Statistics Collection Parameters
Automatic Statistics Collection Parameters
Client Connection Default Parameters
Statement Behavior Parameters
Locale and Formatting Parameters
Other Client Default Parameters
Lock Management Parameters
Workoad Management Parameters
External Table Parameters
Append-Only Table Parameters
Database and Tablespace/Filespace Parameters
Past PostgreSQL Version Compatibility Parameters
Greenplum Array Configuration Parameters
Interconnect Configuration Parameters
Dispatch Configuration Parameters
Fault Operation Parameters
Distributed Transaction Management Parameters
Read-Only Parameters
15. Enabling High Availability Features
Overview of High Availability in Greenplum Database
Overview of Segment Mirroring
Overview of Master Mirroring
Overview of Fault Detection and Recovery
Enabling Mirroring in Greenplum Database
Enabling Segment Mirroring
Enabling Master Mirroring
Knowing When a Segment is Down
Enabling Alerts and Notifications
Checking for Failed Segments
Checking the Log Files
Recovering a Failed Segment
Recovering From Segment Failures
Recovering a Failed Master
Restoring Master Mirroring After a Recovery
16. Backing Up and Restoring Databases
Overview of Backup and Restore Operations
About Parallel Backups
About Non-Parallel Backups
About Parallel Restores
About Non-Parallel Restores
Backing Up a Database
Backing Up a Database with gp_dump
Automating Parallel Backups with gpcrondump
Restoring From Parallel Backup Files
Restoring a Database with gp_restore
Restoring a Database Using gpdbrestore
Restoring to a Different Greenplum System Configuration
17. Expanding a Greenplum System
Planning Greenplum System Expansion
System Expansion Overview
System Expansion Checklist
Planning New Hardware Platforms
Planning Initialization of New Segments
Planning Mirror Segments
Increasing Segments Per Host
About the Expansion Schema
Planning Table Redistribution
Managing Redistribution in Large-Scale Greenplum Systems
Redistributing Append-Only and Compressed Tables
Redistributing Tables with Primary Key Constraints
Redistributing Tables with User-Defined Data Types
Redistributing Partitioned Tables
Preparing and Adding Nodes
Adding New Nodes to the Trusted Host Environment
Verifying OS Settings
Validating Disk I/O and Memory Bandwidth
Integrating New Hardware into the System
Initializing New Segments
Creating an Input File for System Expansion
Creating an input file in Interactive Mode
Expansion Input File Format
Running gpexpand to Initialize New Segments
Rolling Back an Failed Expansion Setup
Redistributing Tables
Ranking Tables for Redistribution
Redistributing Tables Using gpexpand
Monitoring Table Redistribution
Viewing Expansion Status
Viewing Table Status
Removing the Expansion Schema
Monitoring a Greenplum System
Monitoring Database Activity and Performance
Creating the Performance Monitor Database
Creating the Performance Monitor Superuser Role
Configuring the postgresql.conf File
Monitoring System State
Enabling System Alerts and Notifications
Using SNMP with a Greenplum Database System
Enabling Email Notifications
Testing Email Notifications
Checking System State
Viewing Master and Segment Status and Configuration
Viewing Your Mirroring Configuration
Checking Disk Space Usage
Checking Sizing of Distributed Databases and Tables
Checking for Data Distribution Skew
Viewing a Table’s Distribution Key
Viewing Data Distribution
Checking for Query Processing Skew
Viewing Metadata Information about Database Objects
Viewing the Last Operation Performed
Viewing the Definition of an Object
Viewing the Database Server Log Files
Log File Format
Searching the Greenplum Database Server Log Files
Using gp_toolkit
19. Routine System Maintenance Tasks
Routine Vacuum and Analyze
Transaction ID Management
System Catalog Maintenance
Regular System Catalog Maintenance
Intensive System Catalog Maintenance
Vacuum and Analyze for Query Optimization
Routine Reindexing
Managing Greenplum Database Log Files
Database Server Log Files
Management Utility Log Files
Section V: Performance Tuning
20. Defining Database Performance
Understanding the Performance Factors
System Resources
Workload
Throughput
Contention
Optimization
Determining Acceptable Performance
Baseline Hardware Performance
Performance Benchmarks
21. Common Causes of Performance Issues
Identifying Hardware and Segment Failures
Managing Workload
Avoiding Contention
Maintaining Database Statistics
Identifying Statistics Problems in Query Plans
Tuning Statistics Collection
Optimizing Data Distribution
Optimizing Your Database Design
Greenplum Database Maximum Limits
22. Investigating a Performance Problem
Checking System State
Checking Database Activity
Checking for Active Sessions (Workload)
Checking for Locks (Contention)
Checking Query Status and System Utilization
Troubleshooting Problem Queries
Investigating Error Messages
Gathering Information for Greenplum Support
Section VI: Extending Greenplum Database
23 Using Greenplum MapReduce
About Greenplum MapReduce
The Basics of MapReduce
How Greenplum MapReduce Works
Programming Greenplum MapReduce
Defining Inputs
External File Inputs
Database Inputs
OS Executable Inputs
Defining Map Functions
Map Functions in PERL
Map Functions in Python
Defining Reduce Functions
Defining Outputs
Table Outputs
File Outputs
Defining Tasks
Putting Together a Complete MapReduce Specification
Submitting MapReduce Jobs for Execution
Creating the Languages in the Database
Installing Library Files (for Custom C Functions)
Troubleshooting Problems with MapReduce Jobs
Language Does Not Exist
Generic Python Iterator Error
Function Defined Using Wrong MODE
Section VII: References
A. SQL Command Reference
SQL Syntax Summary
ABORT
ALTER AGGREGATE
ALTER CONVERSION
ALTER DATABASE
ALTER DOMAIN
ALTER EXTERNAL TABLE
ALTER FILESPACE
ALTER FUNCTION
ALTER GROUP
ALTER INDEX
ALTER LANGUAGE
ALTER OPERATOR
ALTER OPERATOR CLASS
ALTER RESOURCE QUEUE
ALTER ROLE
ALTER SCHEMA
ALTER SEQUENCE
ALTER TABLE
ALTER TABLESPACE
ALTER TRIGGER
ALTER TYPE
ALTER USER
ANALYZE
BEGIN
CHECKPOINT
CLOSE
CLUSTER
COMMENT
COMMIT
COPY
CREATE AGGREGATE
CREATE CAST
CREATE CONVERSION
CREATE DATABASE
CREATE DOMAIN
CREATE EXTERNAL TABLE
CREATE FUNCTION
CREATE GROUP
CREATE INDEX
CREATE LANGUAGE
CREATE OPERATOR
CREATE OPERATOR CLASS
CREATE RESOURCE QUEUE
CREATE ROLE
CREATE RULE
CREATE SCHEMA
CREATE SEQUENCE
CREATE TABLE
CREATE TABLE AS
CREATE TABLESPACE
CREATE TRIGGER
CREATE TYPE
CREATE USER
CREATE VIEW
DEALLOCATE
DECLARE
DELETE
DROP AGGREGATE
DROP CAST
DROP CONVERSION
DROP DATABASE
DROP DOMAIN
DROP EXTERNAL TABLE
DROP FILESPACE
DROP FUNCTION
DROP GROUP
DROP INDEX
DROP LANGUAGE
DROP OPERATOR
DROP OPERATOR CLASS
DROP OWNED
DROP RESOURCE QUEUE
DROP ROLE
DROP RULE
DROP SCHEMA
DROP SEQUENCE
DROP TABLE
DROP TABLESPACE
DROP TRIGGER
DROP TYPE
DROP USER
DROP VIEW
END
EXECUTE
EXPLAIN
FETCH
GRANT
INSERT
LOAD
LOCK
MOVE
PREPARE
REASSIGN OWNED
REINDEX
RELEASE SAVEPOINT
RESET
REVOKE
ROLLBACK
ROLLBACK TO SAVEPOINT
SAVEPOINT
SELECT
SELECT INTO
SET
SET ROLE
SET SESSION AUTHORIZATION
SET TRANSACTION
SHOW
START TRANSACTION
TRUNCATE
UPDATE
VACUUM
VALUES
B. Management Utility Reference
Backend Server Programs
Management Utility Summary
gp_dump
gp_restore
gpaddmirrors
gpactivatestandby
gpbitmapreindex
gpcheckos
gpcheckperf
gpconfig
gpcrondump
gpdbrestore
gpdeletesystem
gpdetective
gpexpand
gpfdist
gpfilespace
gpinitstandby
gpinitsystem
gpload
gplogfilter
gpmapreduce
gpmigrator
gprecoverseg
gpscp
gpsizecalc
gpsnmpd
gpssh
gpssh-exkeys
gpstart
gpstate
gpstop
gpsys1
C. Client Utility Reference
Client Utility Summary
clusterdb
createdb
createlang
createuser
dropdb
droplang
dropuser
ecpg
pg_config
pg_dump
pg_dumpall
pg_restore
psql
reindexdb
vacuumdb
D. Server Configuration Parameters
add_missing_from
array_nulls
authentication_timeout
backslash_quote
block_size
bonjour_name
check_function_bodies
client_encoding
client_min_messages
cpu_index_tuple_cost
cpu_operator_cost
cpu_tuple_cost
cursor_tuple_fraction
custom_variable_classes
DateStyle
db_user_namespace
deadlock_timeout
debug_assertions
debug_pretty_print
debug_print_parse
debug_print_plan
debug_print_prelim_plan
debug_print_rewritten
debug_print_slice_table
default_statistics_target
default_tablespace
default_transaction_isolation
default_transaction_read_only
dynamic_library_path
effective_cache_size
enable_bitmapscan
enable_groupagg
enable_hashagg
enable_hashjoin
enable_indexscan
enable_mergejoin
enable_nestloop
enable_seqscan
enable_sort
enable_tidscan
escape_string_warning
explain_pretty_print
extra_float_digits
from_collapse_limit
gp_adjust_selectivity_for_outerjoins
gp_analyze_relative_error
gp_autostats_mode
gp_autostats_on_change_threshold
gp_cached_segworkers_threshold
gp_command_count
gp_connections_per_thread
gp_content
gp_dbid
gp_debug_linger
gp_email_from
gp_email_smtp_password
gp_email_smtp_server
gp_email_smtp_userid
gp_email_to
gp_enable_adaptive_nestloop
gp_enable_agg_distinct
gp_enable_agg_distinct_pruning
gp_enable_direct_dispatch
gp_enable_fallback_plan
gp_enable_fast_sri
gp_enable_gpperfmon
gp_enable_groupext_distinct_gather
gp_enable_groupext_distinct_pruning
gp_enable_multiphase_agg
gp_enable_predicate_propagation
gp_enable_preunique
gp_enable_sequential_window_plans
gp_enable_sort_distinct
gp_enable_sort_limit
gp_external_enable_exec
gp_external_grant_privileges
gp_external_max_segs
gp_fts_probe_interval
gp_fts_probe_threadcount
gp_gpperfmon_send_interval
gp_hashjoin_tuples_per_bucket
gp_interconnect_hash_multiplier
gp_interconnect_queue_depth
gp_interconnect_setup_timeout
gp_interconnect_type
gp_log_format
gp_max_csv_line_length
gp_max_databases
gp_max_filespaces
gp_max_local_distributed_cache
gp_max_packet_size
gp_max_tablespaces
gp_motion_cost_per_row
gp_num_contents_in_cluster
gp_reject_percent_threshold
gp_reraise_signal
gp_resqueue_priority
gp_resqueue_priority_cpucores_per_s egment
gp_resqueue_priority_sweeper_interva l
gp_role
gp_safefswritesize
gp_segment_connect_timeout
gp_segments_for_planner
gp_segworker_relative_priority
gp_session_id
gp_set_proc_affinity
gp_set_read_only
gp_snmp_community
gp_snmp_monitor_address
gp_snmp_use_inform_or_trap
gp_statistics_pullup_from_child_ partition
gp_statistics_use_fkeys
gp_vmem_idle_resource_timeout
gp_vmem_protect_limit
gp_vmem_protect_segworker_cache_li mit
gp_workfile_compress_algorithm
gpperfmon_port
integer_datetimes
IntervalStyle
join_collapse_limit
krb_caseins_users
krb_server_keyfile
krb_srvname
lc_collate
lc_ctype
lc_messages
lc_monetary
lc_numeric
lc_time
listen_addresses
local_preload_libraries
log_autostats
log_connections
log_disconnections
log_dispatch_stats
log_duration
log_error_verbosity
log_executor_stats
log_hostname
log_min_duration_statement
log_min_error_statement
log_min_messages
log_parser_stats
log_planner_stats
log_rotation_age
log_rotation_size
log_statement
log_statement_stats
log_timezone
log_truncate_on_rotation
maintenance_work_mem
max_appendonly_tables
max_connections
max_files_per_process
max_fsm_pages
max_fsm_relations
max_function_args
max_identifier_length
max_index_keys
max_locks_per_transaction
max_prepared_transactions
max_resource_portals_per_transaction
max_resource_queues
max_stack_depth
max_work_mem
password_encryption
port
random_page_cost
regex_flavor
resource_cleanup_gangs_on_wait
resource_select_only
search_path
seq_page_cost
server_encoding
server_version
server_version_num
shared_buffers
shared_preload_libraries
ssl
ssl_ciphers
standard_conforming_strings
statement_timeout
stats_queue_level
superuser_reserved_connections
tcp_keepalives_count
tcp_keepalives_idle
tcp_keepalives_interval
temp_buffers
TimeZone
timezone_abbreviations
track_activities
track_counts
transaction_isolation
transaction_read_only
transform_null_equals
unix_socket_directory
unix_socket_group
unix_socket_permissions
update_process_title
vacuum_cost_delay
vacuum_cost_limit
vacuum_cost_page_dirty
vacuum_cost_page_hit
vacuum_cost_page_miss
vacuum_freeze_min_age
work_mem
E. Greenplum MapReduce Specification
Greenplum MapReduce Document Format
Greenplum MapReduce Document Schema
Example Greenplum MapReduce Document
MapReduce Flow Diagram
F. Greenplum Environment Variables
Required Environment Variables
Optional Environment Variables
G. Greenplum Database Data Types
H. System Catalog Reference
I. The gp_toolkit Administrative Schema
gp_toolkit Views
gp_bloat_diag
gp_bloat_expected_pages
gp_log_command_timings
gp_log_database
gp_log_master_concise
gp_log_system
gp_resq_activity
gp_resq_activity_by_queue
gp_skew_coefficients
gp_skew_idle_fractions
gp_statement_priorities
gp_stats_missing
gp_toolkit Functions
gp_param_setting('parameter_name')
Example:
gp_toolkit_version()
gp_toolkit External Tables
gp_disk_free
J. Character Set Support
Setting the Character Set
Character Set Conversion Between Server and Client
K. SQL 2008 Optional Feature Compliance
Glossary
Index
Symbols
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Technical Publications Greenplum Database 4.0 Administrator Guide Last Revised: May 15, 2010 3:40 pm
Greenplum Database Administrator Guide 4.0 - Contents Greenplum Database Administrator Guide 4.0 - Contents Preface ............................................................................................... 1 About This Guide.............................................................................. 1 Greenplum Database Documentation ............................................... 2 Document Conventions............................................................... 2 Contact Us ....................................................................................... 4 ........................................................................................................ 4 Section I: Introduction to Greenplum Chapter 1: About the Greenplum Architecture........................ 6 About the Greenplum Master............................................................ 7 About the Greenplum Segments....................................................... 7 About the Greenplum Interconnect .................................................. 7 About Redundancy and Failover in Greenplum Database .................. 8 About Segment Mirroring............................................................ 8 About Master Mirroring ............................................................... 9 About Interconnect Redundancy ................................................. 9 About Parallel Data Loading ............................................................10 About Management and Monitoring .................................................10 Chapter 2: About Distributed Databases .................................12 Understanding How Data is Stored..................................................12 Understanding Greenplum Distribution Policies................................13 Chapter 3: Summary of Greenplum Features .........................14 System Requirements .....................................................................14 Feature Summary ...........................................................................15 Greenplum SQL Standard Conformance.....................................15 Greenplum and PostgreSQL Compatibility..................................19 Chapter 4: About Greenplum Query Processing ....................26 Understanding Query Planning and Dispatch ...................................26 Understanding Greenplum Query Plans ...........................................27 Understanding Parallel Query Execution ..........................................28 Section II: Access Control and Security Chapter 5: Managing Roles and Privileges ..............................31 Security Best Practices for Roles and Privileges ...............................31 Creating New Roles (Users).............................................................32 Altering Role Attributes..............................................................32 Creating Groups (Role Membership)................................................33 Managing Object Privileges .............................................................34 Simulating Row and Column Level Access Control .....................35 Encrypting Data ..............................................................................35 Chapter 6: Configuring Client Authentication.........................37 Allowing Connections to Greenplum Database.................................37 Editing the pg_hba.conf File.......................................................38 Limiting Concurrent Connections .....................................................39 Encrypting Client/Server Connections .............................................40 Table of Contents ii
Greenplum Database Administrator Guide 4.0 - Contents Chapter 7: Accessing the Database ...........................................41 Establishing a Database Session .....................................................41 Supported Client Applications..........................................................42 Greenplum Database Client Applications....................................42 pgAdmin III for Greenplum Database ........................................43 Database Application Interfaces.................................................46 Third-Party Client Tools .............................................................47 Troubleshooting Connection Problems .............................................48 Chapter 8: Managing Workload and Resources .....................49 Overview of Greenplum Workload Management ..............................49 How Resource Queues Work in Greenplum Database.................49 Steps to Enable Workload Management .....................................52 Configuring Workload Management .................................................53 Creating Resource Queues ..............................................................54 Creating Queues with an Active Query Limit ..............................54 Creating Queues with a Query Planner Cost Limits ....................54 Setting Priority Levels................................................................55 Assigning Roles (Users) to a Resource Queue..................................56 Removing a Role from a Resource Queue ..................................56 Modifying Resource Queues.............................................................56 Altering a Resource Queue.........................................................57 Dropping a Resource Queue ......................................................57 Checking Resource Queue Status ....................................................57 Viewing Queued Statements and Resource Queue Status ..........57 Viewing Resource Queue Statistics ............................................58 Viewing the Roles Assigned to a Resource Queue ......................58 Viewing the Waiting Queries for a Resource Queue....................58 Clearing a Waiting Statement From a Resource Queue ..............59 Viewing the Priority of Active Statements ..................................60 Resetting the Priority of an Active Statement.............................60 Section III: Database Administration Chapter 9: Defining Database Objects......................................62 Creating and Managing Databases ..................................................62 About Template Databases ........................................................62 Creating a Database ..................................................................62 Viewing the List of Databases ....................................................63 Altering a Database ...................................................................63 Dropping a Database .................................................................63 Creating and Managing Tablespaces................................................64 Creating a Filespace...................................................................64 Creating a Tablespace ...............................................................65 Using a Tablespace to Store Database Objects ..........................65 Viewing Existing Tablespaces and Filespaces .............................66 Dropping Tablespaces and Filespaces ........................................66 Creating and Managing Schemas.....................................................66 The Default ‘Public’ Schema.......................................................67 Creating a Schema ....................................................................67 Schema Search Paths ................................................................67 Table of Contents iii
Greenplum Database Administrator Guide 4.0 - Contents Dropping a Schema ...................................................................68 System Schemas .......................................................................68 Creating and Managing Tables ........................................................69 Creating a Table ........................................................................69 Altering a Table .........................................................................76 Dropping a Table .......................................................................77 Partitioning Large Tables .................................................................77 Understanding Table Partitioning in Greenplum Database ..........77 Deciding on a Table Partitioning Strategy ..................................78 Creating Partitioned Tables ........................................................79 Loading Partitioned Tables .........................................................83 Verifying Your Partition Strategy................................................83 Viewing Your Partition Design ....................................................84 Maintaining Partitioned Tables ...................................................84 Creating and Using Sequences ........................................................88 Creating a Sequence..................................................................88 Using a Sequence ......................................................................88 Altering a Sequence...................................................................89 Dropping a Sequence.................................................................89 Using Indexes in Greenplum Database ............................................89 Index Types...............................................................................91 Creating an Index......................................................................92 Examining Index Usage .............................................................93 Managing Indexes .....................................................................94 Dropping an Index.....................................................................94 Creating and Managing Views..........................................................94 Creating Views...........................................................................94 Dropping Views..........................................................................94 Chapter 10: Managing Data .........................................................96 About Concurrency Control in Greenplum Database ........................96 Inserting New Rows ........................................................................97 Updating Existing Rows ...................................................................98 Deleting Rows .................................................................................98 Truncating a Table .....................................................................99 Working With Transactions..............................................................99 Transaction Isolation Levels.......................................................99 Vacuuming the Database .............................................................. 101 Configuring the Free Space Map .............................................. 101 Chapter 11: Querying Data ........................................................ 102 Defining Queries ........................................................................... 102 SQL Lexicon............................................................................. 102 SQL Value Expressions ............................................................ 102 Using Functions and Operators...................................................... 111 Using Functions in Greenplum Database .................................. 111 User-Defined Functions............................................................ 112 Built-in Functions and Operators.............................................. 112 Query Profiling .............................................................................. 119 Reading EXPLAIN Output ......................................................... 120 Table of Contents iv
Greenplum Database Administrator Guide 4.0 - Contents Reading EXPLAIN ANALYZE Output .......................................... 121 What to Look for in a Query Plan ............................................. 122 Chapter 12: Loading and Unloading Data .............................. 124 About the Greenplum Database Loading Tools .............................. 124 About External Tables.............................................................. 124 About gpload ........................................................................... 125 About COPY ............................................................................. 125 Formatting Your Data Files ............................................................ 125 Formatting Rows...................................................................... 125 Formatting Columns ................................................................ 126 Representing NULL Values ....................................................... 126 Escaping .................................................................................. 126 Character Encoding.................................................................. 127 Loading Data into Greenplum Database ........................................ 128 Creating and Using File-Based External Tables......................... 128 Creating and Using Web External Tables.................................. 132 Loading Data Using an External Table...................................... 133 Handling Load Errors ............................................................... 133 Loading Data with gpload ........................................................ 136 Loading Data with COPY .......................................................... 137 Data Loading Performance Tips ............................................... 138 Unloading Data from Greenplum Database.................................... 138 Defining a File-Based Writable External Table .......................... 138 Defining a Command-Based Writable External Table................ 139 Unloading Data Using a Writable External Table ...................... 139 Unloading Data Using COPY ..................................................... 140 Section IV: System Administration Chapter 13: Starting and Stopping Greenplum .................... 142 Overview....................................................................................... 142 Starting Greenplum Database ....................................................... 142 Restarting Greenplum Database .............................................. 142 Uploading Configuration File Changes Only .............................. 143 Starting the Master in Maintenance Mode ................................ 143 Stopping Greenplum Database ...................................................... 143 Chapter 14: Configuring Your Greenplum System .............. 145 About Greenplum Master, Global, and Local Parameters ............... 145 Setting Configuration Parameters.................................................. 145 Setting a Local Configuration Parameter .................................. 146 Setting a Global or Master Configuration Parameter ................ 146 Viewing Settings of Server Configuration Parameters ................... 147 Configuration Parameter Categories .............................................. 147 Connection and Authentication Parameters.............................. 148 System Resource Consumption Parameters ............................. 149 Query Tuning Parameters ........................................................ 150 Error Reporting and Logging Parameters ................................. 152 System Monitoring Parameters ................................................ 152 Runtime Statistics Collection Parameters ................................. 153 Table of Contents v
Greenplum Database Administrator Guide 4.0 - Contents Automatic Statistics Collection Parameters .............................. 153 Client Connection Default Parameters...................................... 154 Lock Management Parameters ................................................. 154 Workoad Management Parameters .......................................... 154 External Table Parameters ....................................................... 155 Append-Only Table Parameters................................................ 155 Database and Tablespace/Filespace Parameters ...................... 155 Past PostgreSQL Version Compatibility Parameters.................. 155 Greenplum Array Configuration Parameters............................. 155 Chapter 15: Enabling High Availability Features ................. 157 Overview of High Availability in Greenplum Database.................... 157 Overview of Segment Mirroring ............................................... 157 Overview of Master Mirroring................................................... 158 Overview of Fault Detection and Recovery ............................... 159 Enabling Mirroring in Greenplum Database.................................... 160 Enabling Segment Mirroring..................................................... 160 Enabling Master Mirroring ........................................................ 161 Knowing When a Segment is Down ............................................... 162 Enabling Alerts and Notifications.............................................. 162 Checking for Failed Segments.................................................. 162 Checking the Log Files ............................................................. 163 Recovering a Failed Segment ........................................................ 163 Recovering From Segment Failures.......................................... 164 Recovering a Failed Master............................................................ 166 Restoring Master Mirroring After a Recovery ............................ 167 Chapter 16: Backing Up and Restoring Databases.............. 169 Overview of Backup and Restore Operations ................................. 169 About Parallel Backups ............................................................ 169 About Non-Parallel Backups ..................................................... 170 About Parallel Restores ............................................................ 170 About Non-Parallel Restores .................................................... 171 Backing Up a Database ................................................................. 171 Backing Up a Database with gp_dump..................................... 172 Automating Parallel Backups with gpcrondump........................ 173 Restoring From Parallel Backup Files ............................................. 174 Restoring a Database with gp_restore ..................................... 174 Restoring a Database Using gpdbrestore ................................. 175 Restoring to a Different Greenplum System Configuration ....... 176 Chapter 17: Expanding a Greenplum System ....................... 178 Planning Greenplum System Expansion......................................... 178 System Expansion Overview .................................................... 178 System Expansion Checklist .................................................... 179 Planning New Hardware Platforms ........................................... 181 Planning Initialization of New Segments .................................. 181 Planning Table Redistribution................................................... 182 Preparing and Adding Nodes ......................................................... 185 Adding New Nodes to the Trusted Host Environment ............... 185 Verifying OS Settings............................................................... 187 Table of Contents vi
Greenplum Database Administrator Guide 4.0 - Contents Validating Disk I/O and Memory Bandwidth ............................. 187 Integrating New Hardware into the System ............................. 188 Initializing New Segments ............................................................. 188 Creating an Input File for System Expansion ........................... 188 Running gpexpand to Initialize New Segments ........................ 191 Rolling Back an Failed Expansion Setup ................................... 192 Redistributing Tables..................................................................... 192 Ranking Tables for Redistribution ............................................ 192 Redistributing Tables Using gpexpand...................................... 193 Monitoring Table Redistribution................................................ 193 Removing the Expansion Schema.................................................. 194 Appendix A: Monitoring a Greenplum System ........................... 195 Monitoring Database Activity and Performance.............................. 195 Creating the Performance Monitor Database ............................ 195 Creating the Performance Monitor Superuser Role ................... 196 Configuring the postgresql.conf File ......................................... 197 Monitoring System State ............................................................... 198 Enabling System Alerts and Notifications ................................. 198 Checking System State............................................................ 203 Checking Disk Space Usage ..................................................... 204 Checking for Data Distribution Skew........................................ 205 Viewing Metadata Information about Database Objects ........... 206 Viewing the Database Server Log Files .......................................... 207 Log File Format........................................................................ 207 Searching the Greenplum Database Server Log Files ............... 208 Using gp_toolkit ............................................................................ 208 Chapter 18: Routine System Maintenance Tasks................. 209 Routine Vacuum and Analyze ........................................................ 209 Transaction ID Management .................................................... 209 System Catalog Maintenance ................................................... 209 Vacuum and Analyze for Query Optimization ........................... 210 Routine Reindexing ....................................................................... 210 Managing Greenplum Database Log Files ...................................... 211 Database Server Log Files ....................................................... 211 Management Utility Log Files ................................................... 211 Section V: Performance Tuning Chapter 19: Defining Database Performance ....................... 213 Understanding the Performance Factors ........................................ 213 System Resources ................................................................... 213 Workload ................................................................................. 213 Throughput.............................................................................. 213 Contention............................................................................... 214 Optimization ............................................................................ 214 Determining Acceptable Performance ............................................ 214 Baseline Hardware Performance .............................................. 214 Performance Benchmarks ........................................................ 214 Table of Contents vii
Greenplum Database Administrator Guide 4.0 - Contents Chapter 20: Common Causes of Performance Issues......... 216 Identifying Hardware and Segment Failures .................................. 216 Managing Workload....................................................................... 217 Avoiding Contention ...................................................................... 217 Maintaining Database Statistics..................................................... 217 Identifying Statistics Problems in Query Plans ......................... 217 Tuning Statistics Collection ...................................................... 218 Optimizing Data Distribution ......................................................... 218 Optimizing Your Database Design.................................................. 218 Greenplum Database Maximum Limits ..................................... 219 Chapter 21: Investigating a Performance Problem ............ 220 Checking System State ................................................................. 220 Checking Database Activity ........................................................... 220 Checking for Active Sessions (Workload) ................................. 220 Checking for Locks (Contention) .............................................. 220 Checking Query Status and System Utilization......................... 221 Troubleshooting Problem Queries .................................................. 221 Investigating Error Messages ........................................................ 221 Gathering Information for Greenplum Support......................... 222 Section VI: Extending Greenplum Database Chapter 22: Using Greenplum MapReduce ............................ 224 About Greenplum MapReduce ....................................................... 224 The Basics of MapReduce......................................................... 224 How Greenplum MapReduce Works.......................................... 225 Programming Greenplum MapReduce............................................ 226 Defining Inputs ........................................................................ 226 Defining Map Functions............................................................ 229 Defining Reduce Functions....................................................... 231 Defining Outputs...................................................................... 233 Defining Tasks ......................................................................... 234 Putting Together a Complete MapReduce Specification ............ 235 Submitting MapReduce Jobs for Execution .................................... 236 Troubleshooting Problems with MapReduce Jobs ........................... 237 Language Does Not Exist ......................................................... 237 Generic Python Iterator Error .................................................. 237 Function Defined Using Wrong MODE....................................... 238 Section VII: References Appendix B: SQL Command Reference....................................... 242 SQL Syntax Summary ................................................................... 244 ABORT .......................................................................................... 270 ALTER AGGREGATE ....................................................................... 271 ALTER CONVERSION ..................................................................... 273 ALTER DATABASE.......................................................................... 274 ALTER DOMAIN ............................................................................. 276 ALTER EXTERNAL TABLE ............................................................... 278 ALTER FILESPACE ......................................................................... 281 Table of Contents viii
分享到:
收藏