Author: Amarendra Singh
Introduction:
In this blog, we are going to understand when we have semi-structure files in a local or any cloud location and have to ingest them in the Snowflake ecosystem. It’s hard to manually create column names in Snowflake and then load them in the table.
Because that takes lots of effort and is a manual process.
Supported Files formats are Apache Parquet, Apache Avro, and ORC files.
Using infer_schema we can handle semi-structured data.
Infer_schema automatically detects the column name and the type of expression in the staged file.
And then, we need to store it in a template. Using this template, we created the table in our Snowflake ecosystem.
When is this feature required?
Suppose you have a bulk of semi-structured files with lots of columns in the external table, user table, and Internally named stage except the staging table.
We have to ingest this data and also create an automated data pipeline.
Using these 3 features we can automate this process:
1. Schema On Read In Snowflake(INFER_SCHEMA)
Using this Snowflake feature, we can automatically manage dynamic column names for the semi-structured data. (here Parquet)
Using this code, we can fetch the data from the internal Stage or external Stage.

INFER_SCHEMA(
LOCATION => '{ internalStage | externalStage }'
, FILE_FORMAT => '<file_format_name>'
, FILES => '<file_name>' [ , '<file_name>' ] [ , ... ]
, IGNORE_CASE => TRUE | FALSE
)
In this scenario, we are loading the parquet file from amazon S3 to the Snowflake external Stage.
2. GENERATE_COLUMN_DESCRIPTION
Using this feature generates a list of columns from stages files that had semi-structured data using inferred schema output results. The output from INFER_SCHEMA can be used as input for creating a table, external table, or view based on column definitions of the stages files.
After generating the column description, I have to create a template to store this output column name.
Using this, we can create the table as per requirement.

3. CREATE TABLE … USING TEMPLATE
Using generate_column_description we can create the template, and using this template, we can create a table with a column name.
Using this template, we have created the table in our Snowflake ecosystem.

This feature is not supported for the internal table stage.
create file format my_parquet_format
type = parquet;
Retrieve the column definitions for Parquet files
-- Query the INFER_SCHEMA function.
select *
from table(
infer_schema(
location=>'@mystage'
, file_format=>'my_parquet_format'
)
);
select *
from table(
infer_schema(
location=>'@mystage'
, file_format=>'my_parquet_format'
, ignore_case=>true
)
);
Overall Code as per demo data sets URL: Infer_Schema.docx - Google Docs
Note:
Using these Snowflake features help you to load bulk semi-structured data in your Snowflake ecosystem.