Avamar DS18 = Utility Node + Spare Node + 16 Active Data Nodes
For a 3.3. TB Gen-3 Grid
RAID Configuration:
How to calculate the required capacity:
Data Gathering
Note: Agent only vs. data store depends on the desired RPO
If RTO < restore_rate then data store else agent only
Always use 3.3 TB nodes when configuring unless additional nodes are required to increase the ingestion rate.
Use the default de-dupe rate unless a POC or assessment has been performed.
Sizing Considerations:
Non-RAIN node must be replicated this includes single node Avamar deployments and 1×2 (1 utility node and 2 data store nodes – this is non-RAIN config) configurations.
**** Remember this: As a general rule it seems that transactional databases are better suited to be backed up to Data Domain and NOT with the Avamar as the hashing of databases is generally very slow.
VMware (specifically using the VMware Storage APIs) and CIFS are well suited for Avamar
Data save rates:
Scan rate:
Performance:
Restores:
Data Fetch Process
NDMP Sizing:
L-0 Fulls on happen once (we don’t want to size for them)
Size for L-1 incremental which will happen in perpetuity following the completion of the L-0 full.
2 Accelerator Nodes
| Config | Max Files | Max Data | Max Streams | |||
| Celerra | NetApp | Celerra | NetApp | Celerra | NetApp | |
| 6 GB | 5 m | 30 m | 4-6 TB | 4-6 TB | 1-2 | 1-2 |
| 36 GB | 40 m | 60 m | 8-12 TB | 8-12 TB | 4 | 4 |
NDMP throughput ~ 100 – 150 TB/hr
Assumed DeDupe Rates:
Tip: Based on scan rate and the amount of data stored for DB backups you can see why Avamar may not be the best choice for DB backups.
NDMP Tips:
Desktop / Laptop
Sizing:
DS18 can support ~ 5000 clients
Number of streams per node default is 18 (17 are usable, one should be reserved for restores).
That completes the brain dump. Wish I had more but that is all for now.
This blog is a high level overview of some extensive testing conducted on the EMC (CLARiiON) CX3-80 with 15K RPM FC (fibre channel disk) and the EMC (CLARiiON) CX4-120 with EFD (Enterprise Flash Drives) formerly know as SSD (solid state disk).
Figure 1: CX4-120 with EFD test configuration.
Figure 2: CX3-80 with 15K RPM FC rest configuration.
Figure 3: IOPs Comparison
Figure 4: Response Time
Figure 5: IOPs Per Drive
Notice that the CX3-80 15K FC drives are servicing ~ 250 IOPs per drive, this exceeds 180 IOPs per drive (the theoretical maximum for a 15K FC drive is 180 IOPs) this is due to write caching. Note that cache is disabled for the CX4-120 EFD tests, this is important because high write I/O load can cause something known as a force cache flushes which can dramatically impact the overall performance of the array. Because cache is disabled on EFD LUNs forced cache flushes are not a concern.
Table below provides a summary of the test configuration and findings:
| Array | CX3-80 | CX4-120 |
| Configuration | (24) 15K FC Drives | (7) EFD Drives |
| Cache | Enabled | Disabled |
| Footprint | ~42% drive footprint reduction | |
| Sustained Random Read Performance | ~12x increase over 15K FC | |
| Sustained Random Write Performance | ~5x increase over 15K FC |
In summary, EFD is a game changing technology. There is no doubt that for small block random read and write workloads (i.e. – Exchange, MS SQL, Oracle, etc…) EFD dramatically improves performance and reduces the risk of performance issues.
This post is intended to be an overview of the exhaustive testing that was performed. I have results with a wide range of transfer sizes beyond the 2k and 4k results shown in this posts, I also have Jetstress results. If you are interested in data that you don’t see in this post please Email me a rbocchinfuso@gmail.com.
In the interest of benchmarking de-duplication rates with databases I created a process to build a test database, load test records, dump the database and perform a de-dupe backup using EMC Avamar on the dump files. The process I used is depicted in the flowchart below.
1. Create a DB named testDB
2. Create 5 DB dump target files – testDB_backup(1-5)
3. Run the test which inserts 1000 random rows consisting of 5 random fields for each row. Once the first insert is completed a dump is performed to testDB_backup1. Once the dump is complete a de-dupe backup process is performed on the dump file. This process is repeated 4 more times each time adding an additional 1000 rows to the database and dumping to a new testDB_backup (NOTE: this dump includes existing DB records and the newly inserted rows) file and performing the de-dupe backup process.
Once the backup is completed a statistics file is generated showing the de-duplication (or commonality) ratios. The output from this test is as follows:
You can see that each iteration of the backup shows an increase in the data set size with increasing commonality and de-dupe rations. This test shows that with 100% random database data using a DB dump and de-dupe backup strategy can be a good solution for DB backup and archiving.
I just completed a fairly comprehensive EMC CLARiiON AX demo video. The demo video is available on YouTube and I am also hosting a high quality video here.
EMC CLARiiON AX Demo
This site is protected with Urban Giraffe's plugin 'HTML Purified' and Edward Z. Yang's
. 307 items have been purified.