top of page

Loading SAP Ariba Data In Snowflake

Author: Sudhanshu Kumar


Moving data from a particular system in SAP to Snowflake can be daunting, fraught with technical and procedural challenges. Unfortunately, there are not many tools available in the market that can simplify the process.


Furthermore, the few available options are not cost-effective, making it a significant obstacle for companies or teams looking to streamline their data transfer. These limitations could lead to costly delays, hamper productivity, and hinder business growth.


As a reference, consider loading SAP Ariba (Supplier and Vendor) Data into Snowflake for further analysis. The columns exported from Ariba are typically abbreviated in 4~5 words in the German language, making it difficult to read, understand and analyze the data unless you have an SAP expert. So let’s explore a quick and easy way to achieve this using an SAP-provided data dictionary table.


Overview

In this article, we will go through the complete process of loading SAP data from tables like EKPO, MARA, etc, into Snowflake. The columns of these tables are 5-letter abbreviated German words for that column. Some common examples are AEDAT: Date fields, and EBELP: Ariba Item number.


For this solution, we will only be focusing on data from SAP ARIBA (supplier data), but this solution can be generalized to fit a broader use case and support other SAP systems as well.


Now, as any familiar translator, we would need a dictionary to translate the German five-letter word to its meaningful English translation. In the example above, for translating EBELP to “Ariba Item Number”, we would need someplace to look this information up. If we look back at SAP, where would we get these translations from? This would be a data dictionary table starting with ‘DD’.


So, by following the same approach for multiple tables having many such columns, we can automate the process with our good old friend Python and Snowflake tasks. We can set this up to copy data from SAP to Snowflake every week or month based on the requirement. In this article, we will only go over the first part of setting up the loading process of SAP tables to Snowflake.


Prerequisites

To achieve this, we would require the following conditions to be met

  1. SAP data in Amazon S3 storage in parquet format. (SAP blog to download data as XML, then it can easily be converted to the required parquet format)

  2. External stage set up pointing to the S3 bucket where your parquet files are stored.

  3. Data dictionary set up in the target schema.


The data dictionary looks something like this:

This can be created by exporting the DD~ series table and compiling them all in the same table inside Snowflake. If you need any help doing this, you can always reach out to us at info@kipi.bi.


Loading Process

As part of prerequisite (1), the file needs to be in the parquet format. This is required as Snowflake provides a very useful command infer_schema which infers the column data type present in the file for us.


Now, if we look back at the original file, half of our problem is solved by inferring the column data type, and the remaining half is the column name itself. For this, we would refer to point (3) of the prerequisites, which is the data dictionary. It has all the information about columns and what they translate into. If you have set up the data dictionary table in the target schema, then most of our work is done. Using Python, we have created a stored procedure that would translate the column name and load the data into the table.


Let’s look at the overall design using the diagram below:

To create an External Stage follow this documentation from Snowflake. To check if your stage is set up properly and showing the same file that you uploaded, use this command: list @<your_stage_name>. On running this query you should see all your parquet files. If that’s not the case you can go back to the documentation to check if you missed any step.


All the file URI in the external stage will be pointing to the files in the S3, for example, s3://sap_tables_parquet/ekpo.parquet and so on. From the file URI we can extract the file name ekpo and check if this table exists.


Here’s the SQL code to check if a table exists in the schema:


use database <your_database>;


select to_boolean(count(1))

from information_schema.tables

where

table_schema = '<schema_name>'

and

table_name='<table_name>';


In case the table doesn’t exist in our target schema, we would start by inferring the columns and their data types. So for the columns, we would refer to the column names in our data dictionary and their meaning. The meaning would be simple English sentences with spaces in them.


The SQL query used to infer schema looks something like this:


select column_name, case

when type='TEXT' then 'VARCHAR'

else type

end as type

from table(

infer_schema(

location=>'@{stage_name}',

files=>'{prq_file}',

file_format=>'{file_format}'));


In further processing, we replace the spaces with underscores and all words are converted into lowercase. So from our example, ‘EBELP’ would look something like ”ariba_item_number”.


Now we have the English translation of the columns, we plug this information back into the infer schema results, which would give us the column names and data types to create a table. Then we just load the data directly into this newly created table.


Python code to translate the column name is the heart of this process. Here is the Python function which performs this action.


def translate_column_names(snow_conn, columns: list, data_dictionary_table="DATA_DICT") -> dict:

"""

For all the column names in `columns`, returns its meaningful text from data dictionary and format it to remove any special character.

:return: Dictionary containing SAP column name -> English text

"""

translated_col_dict = {}

visited = {}


if len(columns) != 0:

field_sql = f"select distinct fieldname, ddtext from {data_dictionary_table} where fieldname in ( "

for col in columns:

field_sql = field_sql + f"'{col}'" + ","

field_sql = field_sql[:-1]

field_sql += ")"

print(field_sql)

sql_result = snow_conn.sql(field_sql).collect()

print("Translation result -> ", sql_result)

else:

print("Error while translating, check column names")

return {}


for row in sql_result:

# process the columns: remove any non-alphanumeric character and replace the whitespace with _

if row[1] != "" and row[1] is not None:

dd_text = re.sub("[^a-zA-Z0-9 ]", "", row[1])

dd_text = re.sub("\\s", "_", dd_text)

# adding a check for the translated part, as 1 or more fieldnames could have same translation

if dd_text in visited.keys():

dd_text = dd_text + "_" + str(row[0])

else:

visited[dd_text] = 1

translated_col_dict[str(row[0])] = str(dd_text)

else:

field_name = row[0].replace('-', "_")

translated_col_dict[str(field_name)] = str(row[0])

print(translated_col_dict)

return translated_col_dict


Conclusion

To conclude, we have explored the process of translating and loading SAP Ariba data to Snowflake. By utilizing a data dictionary table and leveraging Snowflake's COPY command with its variety of options and advanced features, users can ensure a robust and secure data-loading process.


This one-time setup provides an excellent level of flexibility, allowing users to tailor the data-loading process to their specific requirements. Furthermore, the COPY command's advanced features, such as data encryption and automatic file compression, enhance data security and efficiency.


By following the steps outlined in this blog, users can seamlessly translate and load SAP Ariba data to Snowflake, setting themselves up for efficient and effective data analysis. With a focus on data integrity and security, users can be confident in the accuracy and safety of their data.


Next Steps

As mentioned in the Overview section, this is only one part of the solution which loads the data. A downside to this approach is that it needs to be done manually every time. In an ideal world, we would like this to be automated.


To automate the process of loading data, we could configure a Snowflake Task to invoke the stored procedure at set intervals, depending on the requirements. This eliminates the need for manual intervention or the triggering of any additional processes. Additionally, Snowflake keeps track of data loading from the same file into the same table, ensuring that users need not be concerned about inadvertently loading the same file multiple times.


Appendix
  1. Creating a semantic layer in Snowflake for SAP Data.

  2. Extracting SAP Ariba data as XML

  3. Sample data dictionary table DD03M

19 views0 comments

Recent Posts

See All
bottom of page