CREATE EXTERNAL TABLE AS SELECT (Transact-SQL)

**APPLIES TO:** ![no](media/no.png)SQL Server ![no](media/no.png)Azure SQL Database ![yes](media/yes.png)Azure SQL Data Warehouse ![yes](media/yes.png)Parallel Data Warehouse

Creates an external table and then exports, in parallel, the results of a Transact\-SQL SELECT statement to Hadoop or Azure Storage Blob.

Topic link icon Transact-SQL Syntax Conventions (Transact-SQL)

Syntax

  
CREATE EXTERNAL TABLE [ [database_name  . [ schema_name ] . ] | schema_name . ] table_name   
    WITH (   
        LOCATION = 'hdfs_folder',  
        DATA_SOURCE = external_data_source_name,  
        FILE_FORMAT = external_file_format_name  
        [ , <reject_options> [ ,...n ] ]  
    )  
    AS <select_statement>  
[;]  
  
<reject_options> ::=  
{  
    | REJECT_TYPE = value | percentage  
    | REJECT_VALUE = reject_value  
    | REJECT_SAMPLE_VALUE = reject_sample_value  
}  
  
<select_statement> ::=  
    [ WITH <common_table_expression> [ ,...n ] ]  
    SELECT <select_criteria>  

Arguments

[ [ database_name . [ schema_name ] . ] | schema_name . ] table_name
The one to three-part name of the table to create in the database. For an external table, only the table metadata is stored in the relational database.

LOCATION = ‘hdfs_folder
Specifies where to write the results of the SELECT statement on the external data source. The location is a folder name and can optionally include a path that is relative to the root folder of the Hadoop Cluster or Azure Storage Blob. PolyBase will create the path and folder if it does not already exist.

The external files are written to hdfs_folder and named QueryID_date_time_ID.format, where ID is an incremental identifier and format is the exported data format. For example, QID776_20160130_182739_0.orc.

DATA_SOURCE = external_data_source_name
Specifies the name of the external data source object that contains the location where the external data is stored or will be stored. The location is either a Hadoop Cluster or an Azure Storage Blob. To create an external data source, use CREATE EXTERNAL DATA SOURCE (Transact-SQL).

FILE_FORMAT = external_file_format_name
Specifies the name of the external file format object that contains the format for the external data file. To create an external file format, use CREATE EXTERNAL FILE FORMAT (Transact-SQL).

Reject Options
The reject options do not apply at the time this CREATE EXTERNAL TABLE AS SELECT statement is run. Instead, they are specified here so that the database can use them at a later time when it imports data from the external table. Later, when the CREATE TABLE AS SELECT statement selects data from the external table, the database will use the reject options to determine the number or percentage of rows that can fail to import before it stops the import.

REJECT_VALUE = reject_value
Specifies the value or the percentage of rows that can fail to import before database halts the import.

REJECT_TYPE = value | percentage
Clarifies whether the REJECT_VALUE option is specified as a literal value or a percentage.

value
REJECT_VALUE is a literal value, not a percentage. The database will stop importing rows from the external data file when the number of failed rows exceeds reject_value.

For example, if REJECT_VALUE = 5 and REJECT_TYPE = value, the database will stop importing rows after 5 rows have failed to import.

percentage
REJECT_VALUE is a percentage, not a literal value. The database will stop importing rows from the external data file when the percentage of failed rows exceeds reject_value. The percentage of failed rows is calculated at intervals.

REJECT_SAMPLE_VALUE = reject_sample_value
Required when REJECT_TYPE = percentage, this specifies the number of rows to attempt to import before the database recalculates the percentage of failed rows.

For example, if REJECT_SAMPLE_VALUE = 1000, the database will calculate the percentage of failed rows after it has attempted to import 1000 rows from the external data file. If the percentage of failed rows is less than reject_value, the database will attempt to load another 1000 rows. The database continues to recalculate the percentage of failed rows after it attempts to import each additional 1000 rows.

[!NOTE]
Since the database computes the percentage of failed rows at intervals, the actual percentage of failed rows can exceed reject_value.

Example:

This example shows how the three REJECT options interact with each other. For example, if REJECT_TYPE = percentage, REJECT_VALUE = 30, and REJECT_SAMPLE_VALUE = 100, the following scenario could occur:

WITH common_table_expression
Specifies a temporary named result set, known as a common table expression (CTE). For more information, see WITH common_table_expression (Transact-SQL).

SELECT <select_criteria> Populates the new table with the results from a SELECT statement. select_criteria is the body of the SELECT statement that determines which data to copy to the new table. For information about SELECT statements, see SELECT (Transact-SQL).

Permissions

To run this command the database user needs all of these permissions or memberships:

The login needs all of these permissions:

[!IMPORTANT]
The ALTER ANY EXTERNAL DATA SOURCE permission grants any principal the ability to create and modify any external data source object, and therefore, it also grants the ability to access all database scoped credentials on the database. This permission must be considered as highly privileged, and therefore must be granted only to trusted principals in the system.

Error Handling

When CREATE EXTERNAL TABLE AS SELECT exports data to a text-delimited file, there is no rejection file for rows that fail to export.

When creating the external table, the database attempts to connect to the external Hadoop cluster or Azure Storage Blob. If the connection fails, the command will fail and the external table will not be created. It can take a minute or more for the command to fail since the database retries the connection at least 3 times.

If CREATE EXTERNAL TABLE AS SELECT is cancelled or fails, the database will make a one-time attempt to remove any new files and folders already created on the external data source.

The database will report any Java errors that occur on the external data source during the data export.

General Remarks

After the CETAS statement finishes, you can run Transact\-SQL queries on the external table. These operations will import data into the database for the duration of the query unless you import by using the CREATE TABLE AS SELECT statement.

The external table name and definition are stored in the database metadata. The data is stored in the external data source.

The external files are named QueryID_date_time_ID.format, where ID is an incremental identifier and format is the exported data format. For example, QID776_20160130_182739_0.orc.

The CETAS statement always creates a non-partitioned table, even if the source table is partitioned.

For query plans, created with EXPLAIN, the databaseuses these query plan operations for external tables:

APPLIES TO: Parallel Data Warehouse a prerequisite for creating an external table, the appliance administrator needs to configure hadoop connectivity. For more information, see Configure Connectivity to External Data (Analytics Platform System) in the APS documentation which you can download from here.

Limitations and Restrictions

Since external table data resides outside of the database, backup and restore operations will only operate on data stored in the database. This means only the metadata will be backed up and restored.

The database does not verify the connection to the external data source when restoring a database backup that contains an external table. If the original source is not accessible, the metadata restore of the external table will still succeed, but SELECT operations on the external table will fail.

The database does not guarantee data consistency between the databaseand the external data. You, the customer, are solely responsible to maintain consistency between the external data and the database.

Data manipulation language (DML) operations are not supported on external tables. For example, you cannot use the Transact\-SQL update, insert, or delete Transact\-SQL Transact\-SQL to modify the external data.

CREATE TABLE, DROP TABLE, CREATE STATISTICS, DROP STATISTICS, CREATE VIEW, and DROP VIEW are the only data definition language (DDL) operations allowed on external tables.

PolyBase can consume a maximum of 33k files per folder when running 32 concurrent PolyBase queries. This maximum number includes both files and subfolders in each HDFS folder. If the degree of concurrency is less than 32, a user can run PolyBase queries against folders in HDFS which contain more than 33k files. Microsoft recommends users of Hadoop and PolyBase keep file paths short and use no more than 30k files per HDFS folder. When too many files are referenced a JVM out-of-memory exception occurs.

SET ROWCOUNT (Transact-SQL) has no effect on this CREATE EXTERNAL TABLE AS SELECT. To achieve a similar behavior, use TOP (Transact-SQL).

When CREATE EXTERNAL TABLE AS SELECT selects from an RCFile, the column values in the RCFile must not contain the pipe ‘|’ character.

Locking

Takes a shared lock on the SCHEMARESOLUTION object.

Examples

A. Create a Hadoop table using CREATE EXTERNAL TABLE AS SELECT (CETAS)

The following example creates a new external table named hdfsCustomer, using the column definitions and data from the source table dimCustomer.

The table definition is stored in the database, and the results of the SELECT statement are exported to the ‘/pdwdata/customer.tbl’ file on the Hadoop external data source customer_ds. The file is formatted according to the external file format customer_ff.

The file name is generated by the database, and contains the query ID for ease of aligning the file with the query that generated it.

The path hdfs://xxx.xxx.xxx.xxx:5000/files/ preceding the Customer directory must already exist. However, if the Customer directory does not exist, the database will create the directory.

[!NOTE]
This example specifies for 5000. If the port is not specified, the database uses 8020 as the default port.

The resulting Hadoop location and file name will be hdfs:// xxx.xxx.xxx.xxx:5000/files/Customer/ QueryID_YearMonthDay_HourMinutesSeconds_FileIndex.txt..

  
      -- Example is based on AdventureWorks   
CREATE EXTERNAL TABLE hdfsCustomer  
WITH (  
        LOCATION='/pdwdata/customer.tbl',  
        DATA_SOURCE = customer_ds,  
        FILE_FORMAT = customer_ff  
) AS SELECT * FROM dimCustomer;  

B. Use a Query Hint with CREATE EXTERNAL TABLE AS SELECT (CETAS)

This query shows the basic syntax for using a query join hint with the CETAS statement. After the query is submitted the database uses the hash join strategy to generate the query plan. For more information on join hints and how to use the OPTION clause, see OPTION Clause (Transact-SQL).

[!NOTE]
This example specifies for 5000. If the port is not specified, the database uses 8020 as the default port.

  
      -- Example is based on AdventureWorks  
CREATE EXTERNAL TABLE dbo.FactInternetSalesNew  
WITH   
    (   
        LOCATION = '/files/Customer',  
        DATA_SOURCE = customer_ds,  
        FILE_FORMAT = customer_ff  
    )  
AS SELECT T1.* FROM dbo.FactInternetSales T1 JOIN dbo.DimCustomer T2  
ON ( T1.CustomerKey = T2.CustomerKey )  
OPTION ( HASH JOIN );  

See Also

CREATE EXTERNAL DATA SOURCE (Transact-SQL)
CREATE EXTERNAL FILE FORMAT (Transact-SQL)
CREATE EXTERNAL TABLE (Transact-SQL)
CREATE TABLE (Azure SQL Data Warehouse, Parallel Data Warehouse)
CREATE TABLE AS SELECT (Azure SQL Data Warehouse)
DROP TABLE (Transact-SQL)
ALTER TABLE (Transact-SQL)