Tuesday, April 12, 2011

Oracle Data Pump


Introduction:

Oracle introduced the Data Pump in Oracle Database 10g Release 1. This new oracle technology enables very high transfer of data from one database to another. The oracle Data Pump provides two utilities namely:

1.      Data Pump Export which is invoked with the expdp command.
2.      Data Pump Import which is invoked with the impdp command.

The above two utilities have similar look and feel with the pre-Oracle 10g import and export utilities (imp and exp) but are completely separate. This means the dump files generated by the original export utility (exp) cannot be imported by the new data pump import utility (impdp) and vice-versa.

Data Pump Export (expdp) and Data Pump Import (impdp) are server-based rather than client-based as is the case for the original export (exp) and import (imp). Because of this, dump files, log files, and sql files are accessed relative to the server-based directory paths. Data Pump requires that directory objects mapped to a file system directory be specified in the invocation of the data pump import or export.

It for this reason and for convenience that a directory object be created before using the data pump export or import utilities. For example to create a directory object named expdp_dir located at /u01/backup/exports enter the following sql statement: 

SQL> create directory expdp_dir as '/u01/backup/exports';

Then grant read and write permissions to the users who will be performing the data pump export and import. 
SQL> grant read,write on directory expdp_dir to system, user1, user2, user3;

What is Oracle's Data Pump?
Oracle simply states that Data Pump is a "very high-speed" mechanism for moving data and meta-data from one database to another. Is Data Pump a complete replacement for export (exp) and Import (imp)? Oracle clearly states that the only time you would/ should use the original (exp & imp) is when you need backward compatibility to an earlier version that does not have Data Pump export & import (expdp & impdp). Basically if you wanted to import a 10g export into a pre-10g database or import into a 10g database from a pre-10g database.

Following are the process involved in the data pump operation:

Client Process :     This process is initiated by client utility. This process makes a call to the data pump API. Once the data pump is initiated, this process is not necessary for the progress of the job.

Shadow Process:     When client log into the database, foreground process is created. It services the client data pump API requests. This process creates the master table and creates Advanced queuing queues used for communication. Once client process ends, shadow process also go away.

Master Control Process:     MCP controls the execution of the data pump job. There is one MCP per job. MCP divides the data pump job into various metadata and data load or unload jobs and hands them over to the worker processes.

Worker Process:     MCP creates worker process based on the value of the PARALLEL parameter. The worker process performs the task requested by MCP.

Data pump is a server side technology and it can transfer large amounts of data very quickly, using parallel streams to achieve maximum throughput, they can be 15-45% faster than then older import/export utilities. Advantages using data pump are

·           Ability to estimate jobs times
·           Ability to restart failed jobs
·           Perform fine-grained object selection
·           Monitor running jobs
·           Directly load a database from a remote instance via the network
·           Remapping capabilities
·           Improved performance using parallel executions


Data Pump Uses

Use can use data pump for the following
·         migrating databases
·         copying databases
·         Transferring oracle databases between different operating systems
·         Backing up important tables before you change them
·         Moving database objects from one tablespace to another
·         Transporting tablespace's between databases
·         Reorganizing fragmented table data
·         Extracting the DDL for tables and other objects such as stored procedures and packages

Data Pump components
Data pump technology consists of three major components
·         dbms_datapump - the main engine for driving data dictionary metadata loading and unloading
·         dbms_metadata - used to extract the appropriate metadata
·         command-line - expdp and impdp are the import/export equivalents

                                      Invoking Data Pump Export

You can invoke the data pump export using a command line. Export parameters can be specified directly in the command line.

Full Export Mode:    A full export is specified using the FULL parameter. In a full database export, the entire database is unloaded. This mode requires that you have the EXP_FULL_DATABASE role. Example :

$ expdp system/<password> DIRECTORY=expdp_dir DUMPFILE=expfull.dmp FULL=y LOGFILE=expfull.og


Schema Export Mode:   The schema export mode is invoked using the SCHEMAS parameter. If you have no EXP_FULL_DATABASE role, you can only export your own schema. If you have EXP_FULL_DATABASE role, you can export several schemas in one go. Optionally, you can include the system privilege grants as well.

$ expdp hr/hr DIRECTORY=expdp_dir DUMPFILE=schema_exp.dmp SCHEMAS=hr,sh,oe


Table Export Mode:  This export mode is specified using the TABLES parameter. In this mode, only the specified tables, partitions and their dependents are exported. If you do not have the EXP_FULL_DATABASE role, you can export only tables in your own schema. You can only specify tables in the same schema.

$ expdp hr/hr DIRECTORY=expdp_dir DUMPFILE=tables_exp.dmp TABLES=employees,jobs,departments


                                           Invoking Data Pump Import

The data pump import can be invoked in the command line. The export parameters can be specified directly in the command line.

Full Import Mode   The full import mode loads the entire contents of the source (export) dump file to the target database. However, you must have been granted the IMP_FULL_DATABASE role on the target database. The data pump import is invoked using the impdp command in the command line with the FULL parameter specified in the same command line. 

$ impdp system/<password> DIRECTORY=expdp_dir DUMPFILE=expfull.dmp FULL=y LOGFILE=impfull.log

Schema Import Mode:  The schema import mode is invoked using the SCHEMAS parameter. Only the contents of the specified schemas are load into the target database. The source dump file can be a full, schema-mode, table, or tablespace mode export files. If you have a IMP_FULL_DATABASE role, you can specify a list of schemas to load into the target database.

$ impdp hr/hr DIRECTORY=expdp_dir DUMPFILE=expfull.dmp SCHEMAS=hr,sh,oe

Table Import Mode:  This export mode is specified using the TABLES parameter. In this mode, only the specified tables, partitions and their dependents are exported. If you do not have the EXP_FULL_DATABASE role, you can import only tables in your own schema. 

$ impdp hr/hr DIRECTORY=expdp_dir DUMPFILE=expfull.dmp TABLES=employees,jobs,departments


Interesting Features
  • Data Pump export and import are not compatible with the old exp & imp. So if you need to import into a pre-10g database it is best to stick with the original export utility (exp).
  • There are new 10g features that are only supported (exported & imported) with the new Data Pump utility.
  • With Data Pump you can allocate parallel threads to fine-tune resource consumption during export and import. Also available is to have multiple client processes (expdp & impdp) to attach to a Data Pump job. Both of these options allow you to throw more resources to get the Data Pump job completed. Data Pump also works on a set of dump files rather than a single sequential dump file and you can add additional dump files to a running dump process. I am sure we will have to be concerned here with too many I/O operations to a single disk area.
  • Restart ability is supported for export jobs but also you can stop and restart the jobs as you see fit. So if you see a job is consuming too many resources or it is running too long and going to effect production, you can just stop the job and then restart at a later time.
  • Because Data Pump works on the database server through packaged procedures and directory structures, you now have the ability to perform export and import operations over the network.
  • Various features that allow for the re-mapping, re-naming, including, or excluding of database objects and database structures. As we saw in the DBMS_METADATA package in the last two articles how to compare across schemas and translate one schema ownership to another, Data Pump also has these types of abilities for moving objects and data.
  • Versioning capabilities of database objects.
  • To use Data Pump you must have EXP_FULL_DATABASE or IMP_FULL_DATABASE depending if you will be performing export or import operations. These allow you to expdp & impdp across ownership for items such as grants, resource plans, schema definitions, and re-map, re-name, or re-distribute database objects or structures.
  • Access to data is through direct path and external tables. Both of which, under the covers, have the same data format so Oracle can switch between the two depending on a best method approach to move your data. Some of which is dependent upon the targeted structure as direct path cannot be used for some object types and database structures.
  • Monitoring mechanisms have been put in place to see how the Data Pump job is performing. You can monitor through a log file created or one of three views that assist in the monitoring of Data Pump jobs.
DBA_DATAPUMP_JOBS   shows all the active Data Pump jobs and details on the state of the job.
 DBA_DATAPUMP_SESSIONS shows the user sessions that are attached to a Data Pump job.
V$SESSION_LONGOPS shows the progress of a Data Pump job in regards to the amount of work it needs to do and how much it has done.
  • Through the use of internal tables, and what Oracle calls master & worker processes, there exists an intercommunication between the processes that are performing work on behalf of Data Pump and internal logging information that allows the user, and Oracle, to understand where in the process Data Pump is.
 The available modes are as follows:
·         Full Import/Export Mode
·         Schema Mode
·         Table Mode
·         Tablespace Mode
·         Transportable Tablespace Mode

Running Data Pump
You can either run via a command line specifying options or use a parameter file, there are many options to Data Pump so it would be best to check out the oracle web site, I have given a few examples below

                                                Exporting
database
# expdp vallep/password directory=datapump full=y dumpfile=data.dmp
filesize=2G parallel=2 logfile=full.log

NOTE : Increase the parallel option based on the number of CPU's you have

Schema
 # expdp sys/password schemas=testuser dumpfile=data.dmp logfile=schema.log

Table
 # expdp vallep/password tables=accounts,employees dumpfile=data.dmp
content=metadata_only

Tablespace
 # expdp vallep/password tablespaces=users dumpfile=data.dmp
logfile=tablespace.log

                                                 Importing
database
# impdp system/password full=y dumpfile=data.dmp nologfile=y

schema change
# impdp system/password schemas=’HR’ remap_schema=’HR:HR_TEST’
content=data_only
# impdp system/passwd remap_schema=’TEST:TEST3’ tables=test log=…
dumpfile=… directory=…

Other Options
directory specifies a oracle directory object filesize split the dump file into specific sizes (could be used if filesystem has 2GB limit) parfile specify the parameter file compression compression is used by default but you can stop it exclude/include metadata filtering query selectively export table data usin a SQL statement estimate tells you how much space is required without running the job network link you can perform a export across a network encryption you can encrypt data within the data pump file parallel increase worker processes to increase throughput, base it on number of CPU's
remap_schema move objects from one schema to another
remap_datafile change the name of the datafile when moving across different systems
remap_tablespace move from one tablespace to another

You can exclude objects while performing a export by using EXCLUDE option of Data Pump utility. For example you are exporting a schema and don’t want to export tables whose name  starts with “A” then you can type the following command
$expdp scott/tiger DIRECTORY=data_pump_dir
DUMPFILE=scott_schema.dmp  SCHEMAS=SCOTT EXCLUDE=TABLE:”like ‘A%’”

Then all tables in Scott’s Schema whose name starts with “A“will not be exported.
Similarly you can also INCLUDE option to only export certain objects like this

$expdp scott/tiger DIRECTORY=data_pump_dir DUMPFILE=scott_schema.dmp
   SCHEMAS=SCOTT INCLUDE=TABLE:”like ‘A%’”

This is opposite of EXCLUDE option i.e. it will export only those tables of Scott’s schema whose name starts with “A”
Similarly you can also exclude INDEXESCONSTRAINTSGRANTSUSERSCHEMA


You can use QUERY option to export only required rows. For Example, the following will export only those rows of employees tables whose salary is above 10000 and whose dept id is 10.
 expdp hr/hr QUERY=emp:'"WHERE dept_id > 10 AND sal > 10000"'
        NOLOGFILE=y DIRECTORY=dpump_dir1 DUMPFILE=exp1.dmp


You can suspend running export jobs and later on resume these jobs or kill these jobs using Data Pump Export. You can start a job in one client machine and then, if because of some work, you can suspend it. Afterwards when your work has been finished you can continue the job from the same client, where you stopped the job, or you can restart the job from another client machine.
For Example, suppose a DBA starts a full database export by typing the following command at one client machine CLNT1 by typing the following command

$expdp scott/tiger@mydb FULL=y DIRECTORY=data_pump_dir  DUMPFILE=full.dmp LOGFILE=myfullexp.log  JOB_NAME=myfullJob

After some time, the DBA wants to stop this job temporarily. Then he presses CTRL+C to enter into interactive mode. Then he will get the Export> prompt where he can type interactive commands
Now he wants to stop this export job so he will type the following command

Export> STOP_JOB=IMMEDIATE
Are you sure you wish to stop this job ([y]/n): y

The job is placed in a stopped state and exits the client.

After finishing his other work, the DBA wants to resume the export job and the client machine from where he actually started the job is locked because, the user has locked his/her cabin. So now the DBA will go to another client machine and he reattach to the job by typing the following command

$expdp hr/hr@mydb ATTACH=myfulljob
After the job status is displayed, he can issue the CONTINUE_CLIENT command to resume logging mode and restart the myfulljob job.

Export> CONTINUE_CLIENT

A message is displayed that the job has been reopened, and processing status is output to the client.
Note: After reattaching to the Job a DBA can also kill the job by typing KILL_JOB, if he doesn’t want to continue with the export job.

READ MORE........


2 comments:

Anonymous said...

Good information, Chandu. Thanks.
One big question for my team: Will Data Pump enable us to transfer a Linux-based 300GB 10g database across the wire to a Windows Server-based 11gR2 database? We've got the official Oracle 11g upgrade manual, and it looks though a cross-OS migration won't be trivial.
Thanks,
Bill

John said...

Thanks for this blog keep sharing your thoughts like this...
Why Oracle Cloud
What is Oracle Cloud


Post a Comment

Auto Scroll Stop Scroll