I have found couple of points to improve data flow task level.I would like to describe below:
Data Flow Performance:
I do consider tuning at ETL( Extract, Transform and Load) sections.
Extraction Level:-
Network Tuning:
1. Change the network packet size in the connection manager
- Higher values typically yield fast through put
- By Default, it will be ZERO, you can change MAX value as: 32767
2. Try to experiment with shared memory vs TCP/IP
3. Enable JUMBO frames on network. To do this please consult your network specialists
4. If you have OLEDB Command statements, then try to create another connection manager with low package size.( Not go with 32767 max value )
SQL Queries:
1. Consider using NOLOCK hint on table in your source query.
- Removes locking overhead
- Improves the speed of large tables scan
- Risky side effects
- Understand before using hints in the source query
2. SELECT query with only selected columns
3.Do require conversions in the source query, don't go with Data Conversion T/F later.
LookUp:-
1.Changes SELECT statement to only use the columns you need. So that, it optimizes memory usage
2.Consider adding NOLOCK hint.
3.In SSIS 2008, use shared lookup cache. Create cache transform and cache connection manager
4.Use SQL query for reference table
5.Use WHERE condition in SQL query in LookUp T/F.
6.When your input row count is large then go for using partial cache or full cache lookup to improve performance.
7.No cache lookup is a row based operation
8.A partial cache builds the cache as the lookup T/F is executing and also comes with high transaction impact. A partial cache approach is viable solution if you have large number of input rows.
9.However, best option is use "Full Cache" of lookup and filter the reference table.
10. Apply an 80/20 rule and load one lookup T/F with 20 percent of the most common matching records, which will allow 80 percent matches in the full cache.
Transform Level:-
Different transformations:
1.Row based(synchronous)
- Logically works row by row
- Data conversion,Derived column examples
- Buffer reused
2.Partial blocking(Asynchronous)
- Works with groups of rows
- Merge,Merge Join,Union All,LookUp examples
- Data copied to new buffers
3. Blocking( Asynchronous)
- Need all input rows before producing any output rows
- Aggregate,Sort,Pivot&UnPivot examples
- Data copied to new buffers
Data Types:
1.Make data types as narrow as possible so you will allocate less memory for your transformation
2.Do not perform excessive casting of data types
- It will degrade performance
- Cast source types at the database using CAST/CONVERT functions where ever possible
Push Up or Down where ever possible:
1.Data flow transformations can often be optimized
- Sort: Push to source queries when possible, instead of using SORT transformation for sorting cross database joins.
- Use IsSorted and SortKeyPosition option to sort data fro OLEDB source data instead of going with SORT transformation.
- Go with MERGE sql statement instead of SCD and LookUp T/F's for SCD's data handling.
- Use GROUP BY in sql source query instead of Aggregation T/F.
- Use INSERT INTO statement instead of a data flow task on a single instance
- Find about DELTA load VS RELOAD data
Loading Level:-
1. Use SQL Server destination
- only when package and server are on the same server
- Error handling weaker than OLEDB Destination
2.Commit size = 0, then fast
3.Drop some indexes based on load growth %
4.Load data in to partitons tables
5. Truncate table instead of DELETE statement
Other options to consider:
1.BLOB Temp Storage Path: Binary Large Object
2.Buffer Temp Storage Path: change the Buffer Temp Storage Path and BLOB Temp Storage Path to drive locations that have been optimized and ensure they are not using the C:/ system drive.
3.Default Buffer Max Rows - 10,000 means no single buffer can have more than 10,000 rows
4.Default Buffer Size - 10,48,5760 Specify number of bytes that a single buffer cannot exceed
If single row is 1200 bytes the max rows(10,000) times row width(1200)
10,000 * 1200 ==> 12000000 which is greater than default buffer size
5.Engine Threads - 20
6.Run in Optimized Mode - True - Ignores unused source columns,destination columns and any T/F's.
Other options to consider:
1.BLOB Temp Storage Path: Binary Large Object
2.Buffer Temp Storage Path: change the Buffer Temp Storage Path and BLOB Temp Storage Path to drive locations that have been optimized and ensure they are not using the C:/ system drive.
3.Default Buffer Max Rows - 10,000 means no single buffer can have more than 10,000 rows
4.Default Buffer Size - 10,48,5760 Specify number of bytes that a single buffer cannot exceed
If single row is 1200 bytes the max rows(10,000) times row width(1200)
10,000 * 1200 ==> 12000000 which is greater than default buffer size
5.Engine Threads - 20
6.Run in Optimized Mode - True - Ignores unused source columns,destination columns and any T/F's.
No comments:
Post a Comment