HP D2D2503i Manual Download | Manualshive

Page: 19 / 50

background image

4 Data deduplication

In this chapter:

•

What is data deduplication? (page 19)

•

Data deduplication and the HP StoreOnce Backup System (page 19)

•

Tape rotation example with data deduplication (page 20)

What is data deduplication?

Data deduplication is a process that compares blocks of data being written to the backup device
with data blocks previously stored on the device. If duplicate data is found, a pointer is established
to the original data, rather than storing the duplicate data sets. This removes, or “deduplicates,”
the redundant blocks. The key part of this is that the data deduplication is being done at the block
level and not at the file level which reduces the volume of data stored significantly.

Figure 3 Data stored after deduplication

The importance of the Index files

As a backup stream arrives at the HP StoreOnce Backup System the stream of data is “chunked”
into nominal 4K chunks, a hashing algorithm is run on each of these 4K chunks and this produces
a unique digital fingerprint which is written to an index file.

This process is repeated real time for every chunk of data involved in the first backup stream. When
subsequent backups run it is highly likely they will create identical hash codes, in which case the
hash count in the index is increased; the data associated with the hash code is not stored again
because it already resides in the Deduplication Store. So we only store the data once for any given
hash code – hence StoreOnce.

The Index files contain the mapping for the hashed data chunks created by deduplication and are
the main point of reference accessed and updated by both replication and housekeeping. Without
them, data cannot be restored successfully.

Data deduplication and the HP StoreOnce Backup System

Data deduplication is applied per library device or share. When you configure the library or share,
it defaults to deduplication enabled; this cannot be disabled.

A device is associated with a host server and deduplication allows a greater amount of backup
history to be stored for that host. A larger number of full backups can be achieved, which makes
possible a rotation strategy with a longer retention history. It does not increase the number of host
servers that may be connected. The deduplication factor that has been applied to a device is
calculated and displayed on the Web Interface. This figure is dynamic, it updates automatically
as more data is written to the device.

What is data deduplication?

19

«
...
17
18
19
20
21
...
»

Summary of Contents for D2D2503i

Page 1: ...o optimize performance and minimize the impact of deduplication replication and housekeeping operations competing for resources The information in this guide is valid for both single node and multi node StoreOnce Backup Systems IMPORTANT The English version that is available on the web at launch may have later updates Always check http www hp com support manuals for the most up to date documentati...

Page 2: ...products and services Nothing herein should be construed as constituting an additional warranty HP shall not be liable for technical or editorial errors or omissions contained herein WARRANTY STATEMENT To obtain a copy of the warranty for this product see the warranty information website http www hp com go storagewarranty Linear Tape Open LTO LTO Logo Ultrium and Ultrium Logo are trademarks of Qua...

Page 3: ...exible emulation 13 VTL devices on Fibre Channel 13 Fibre Channel topologies 13 Zoning 14 VTL devices on an Ethernet network not HP B6000 14 Backup applications and Replication Target libraries 16 3 NAS shares 17 Operating system support 17 Backup application support 17 Maximum number of NAS shares 17 Maximum number of files per NAS share and appliance 17 Maximum number of users per CIFS share 18 ...

Page 4: ... 34 Recovering a Source Appliance 35 6 Housekeeping 38 What is housekeeping 38 What effect does housekeeping have on performance 38 Why is housekeeping important 38 What do I need to do 38 For more information 39 7 Performance 40 Optimizing peformance 40 StoreOnce key processes 40 How to avoid overlaps 40 For more information 40 Balancing performance and deduplication ratios 40 To be avoided 41 Gl...

Page 5: ...up System provide network file share access that is optimized for backup to disk They should not be used for general purpose file storage Virtual Tape Library targets for backup applications The backup target appears to the host as an Ultrium Tape Library and requires a backup application that supports backup to tape Tape Library emulation type is selected during initial configuration and this det...

Page 6: ...rface CLI commands to perform additional administrative and support tasks as well as many of the web functions The CLI is accessed via SSH to the IP_address of the B6000 Management Console Multi node systems Earlier Backup Systems referred to as G1 and G2 were single node systems from G3 onwards the HP StoreOnce Backup System will initially be available as a multi node system The single node Store...

Page 7: ...t and all functions continue to work correctly without the user having to make any changes to the IP address configuration To summarize couplets have no single point of failure in hardware providing and minimizing the amount of downtime because External ports that are used to connect to the customer s network are bonded normally in pairs which means that each set of bonded ports has the same physi...

Page 8: ...ted DHCP is not supported DNS is supported A maximum of two sub nets is supported Network bonding is required but network bonding between 1G port and 10G ports is not supported Customers can only have one external IP for configuring the B6000 Management Console GUI and CLI There is no VTL support on Ethernet NAS shares and replication data use the same Ethernet channel If you wish to use a separat...

Page 9: ...e specifications that are supported For information about using the Web Management Interface to configure and monitor devices refer to the HP StoreOnce Backup System user guide For information about using the Command Line Interface commands refer to the HP StoreOnce Backup System CLI reference guide For more information 9 ...

Page 10: ...anagement Interface it is not supported on your model For example the HP D2D2500 Series does not support the ESL e and EML e Library emulation types D2DBS Generic Library This is a tape library device which allows you to configure many drives per library and many cartridges per library G3 products HP B6000 up to 192 drives per node and up to 16384 cartridges per library G2 products HP D2D2502 2504...

Page 11: ...Ultrium tape drives and 96 cartridge slots ESL E series Library This is an enterprise tape solution which allows you to configure many drives per library and many cartridges per library see values for D2DBS emulation above EML E series Library This is an enterprise tape library solution which allows you to configure many drives per library and many cartridges per library see values for D2DBS emula...

Page 12: ...4 you will not be able to configure NAS shares See also Flexible emulation page 13 Table 3 Maximum number of libraries and drives per G3 B6000 appliance with maximum couplets 4 per node couplet 384 48 96 Max libraries 1536 192 384 Max drives per library 1920 240 480 Max devices per appliance node 16384 16384 Max slots per library D2DBS G2 and G1 products For iSCSI VTL devices a single Windows or L...

Page 13: ...ill have no more drives The total value also applies to NAS shares If you configure the full value as VTL devices you will not be able to configure any NAS shares for that appliance Please refer to the D2D Best Practices for VTL NAS and Replication implementations for maximum and recommended values VTL devices on Fibre Channel When you assign a library to a FC port port 1 or port 2 it becomes visi...

Page 14: ...wing for determining how and when to use zoning Small fabric 16 ports or less may not need zoning Small to medium fabric 16 128 ports use host centric zoning Host centric zoning is implemented by creating a specific zone for each server or host and adding only those storage elements to be utilized by that host Host centric zoning prevents a server from detecting any other devices on the SAN or inc...

Page 15: ...ring installation Data from each host goes to its corresponding iSCSI library Each backup device is visible only to the host for which it has been configured A host may have multiple devices configured for it on the HP StoreOnce Backup System but this means fewer hosts may be connected not illustrated The following figure shows a configuration with three hosts The Installation wizard automatically...

Page 16: ...n it may be useful to make a target library visible to the backup application on the host To confirm that replication is working correctly and check the integrity of the replicated backup by doing a test restore To perform manual tape copy jobs to a tape device on the network using the backup application See the HP StoreOnce Backup System user guide for more information about using this feature 16...

Page 17: ...about supported applications refer to http www hp com go connect and http www hp com go ebs Maximum number of NAS shares The total number of devices provided by a StoreOnce appliance is split between VTL devices and NAS shares These devices may be all VTL all NAS or any combination of NAS and VTL devices Maximum number of files per NAS share and appliance The HP StoreOnce NAS implementation is opt...

Page 18: ...640 128 1 12 96 Max Total Open files per appliance 128 128 64 48 32 Max Open files 24 MB per appliance Table 8 Maximum number of files G1 products HP D2D41 12 HP D2D4004 9 HP D2D2504 HP D2D2503 HP D2D2502 25000 25000 25000 25000 25000 Max files per share 1 12 40 1 12 40 96 Max Total Open files per appliance 48 24 48 16 32 Max Open files 24 MB per appliance Maximum number of users per CIFS share Th...

Page 19: ...When subsequent backups run it is highly likely they will create identical hash codes in which case the hash count in the index is increased the data associated with the hash code is not stored again because it already resides in the Deduplication Store So we only store the data once for any given hash code hence StoreOnce The Index files contain the mapping for the hashed data chunks created by d...

Page 20: ...lication The two most significant factors affecting the deduplication ratio for backup are How long do you retain the data How much data changes between backups The following example shows projected savings for a 1 TB file server backup Retention policy 1 week daily incrementals 5 6 months weekly fulls 25 Data parameters Data compression rate 2 1 Daily change rate 1 10 of data in 10 of files Typic...

Page 21: ...B 50 GB 4th daily incremental backup 5 GB 50 GB 5th daily incremental backup 25 GB 500 GB 2nd weekly full backup 25 GB 500 GB 3rd weekly full backup 25 GB 500 GB 25th weekly full backup 1 125 GB 12 750 GB TOTAL Figure 4 Space saving with deduplication Tape rotation example with data deduplication 21 ...

Page 22: ...ch B6000 Backup System has at least two service sets that can be selected as appliances All mapping is done using the virtual IP addresses of the target and source service sets Within this chapter the term appliance and service set are synonymous Replication overview Replication is a standard term used to describe a way of synchronizing data between hardware in two physical locations It is the pro...

Page 23: ...nderstand the basic concepts and terminology Decide which deployment model is appropriate for your organization Map your replication workflow and decide how the first backup will be seeded onto the target appliance Configure and monitor replication on the Web Management Interface Define your Disaster Recovery strategy if the Source Appliance fails Replication and firewalls If replication needs to ...

Page 24: ...contains slots that need to be replicated Target Appliance This is the StoreOnce Backup System with the NAS share or library device that contains the replicated data it is an exact match of the NAS share or library on the source StoreOnce Backup System This appliance needs a replication license Non Replicating Device This describes a NAS share or library on a StoreOnce Backup System that has not b...

Page 25: ...nnot split slot mappings from one Source Library across several Target Libraries NOTE Any number of slots within a Replication Source library may be selected for replication to a Replication Target library This selection is called a slot mapping collection and may be edited after initial mapping creation The number of Target Appliances to which a Source Appliance can replicate varies according to ...

Page 26: ...2D4000 4100 4300 Series HP D2D250x Series Each HP D2D 250x Series Source Appliance can replicate to up to two Target Appliances Figure 7 HP D2D 2500 Fan OUT example Fan IN When we talk about FAN IN we are considering the number of replication Source Appliances that are supported HP B6000 Series Each HP B6000 Target Appliance can support up to 50 Source Appliances per node This equates to up to 100...

Page 27: ...ata Center Figure 8 HP D2D250x Series Fan IN example HP D2D4xxx Series Each HP D2D4004 4009 or HP D2D4106 Target Appliance can support up to 16 Source Appliances Each HP D2D41 12 Target Appliance can support up to 24 Source Appliances Each HP D2D4312 Target Appliance can support up to 50 Source Appliances Each HP D2D4324 Target Appliance can support up to 50 Source Appliances Shares may only be ma...

Page 28: ...libraries on Source Appliances into a single Target Library on a Target Appliance allows for some additional level of data deduplication across the backed up data for example common operating system files Remote Site 2 has two D2D Appliances each with two Source Libraries We can map the slot mapping collections from those libraries into a single HP D2D 4xxx series Target Library Remote Site 3 has ...

Page 29: ...lot mapping That mapping can only be replicated to a single Target Library but that Target Library can hold multiple slot mappings from different Source Libraries which may also be from different Source Appliances Fan IN and shares In the above example 20 shares would be required on the Target Appliance Job concurrency There is also a limit on the number of jobs that can run concurrently See the a...

Page 30: ...d job concurrency rules G2 products HP D2D 4312 4324 HP D2D 4106 41 12 HP D2D 2502 2504 8 4 2 Maximum number of Target appliances Appliance Fan Out supported by one Source appliance 50 D2D4106 16 D2D41 12 24 D2D2502 4 D2D2504 8 Maximum number of Source appliances Appliance Fan In supported by one Target appliance 1 1 1 Maximum number of Target libraries to Library Fan Out which a Source library ma...

Page 31: ...pliance Concurrent source jobs Replication deployment strategies There are many deployment strategies the following three are perhaps the most typical Active to Passive Active to Active Many to One Active to Passive example In the following example there are two offices each with an HP StoreOnce Backup System The HP StoreOnce Backup System in Office A is being used by the host server for backup an...

Page 32: ...plication example Many to One example In the following example a company has two remote offices and a central data center There are small HP StoreOnce Backup Systems in each remote office which host Replication Source Libraries and Shares for backup from local host machines In the data center there is a larger StoreOnce Backup System that is used solely to host Replication Target Libraries and Sha...

Page 33: ...to multiple HP StoreOnce Backup Systems you require a separate license for each target appliance appropriate to the model type and must repeat the activation process for each HP StoreOnce Backup System NOTE The Replication license can only be used on the model for which it was purchased For example the Replication License for a D2D250x Series can only be used on a D2D250x Backup System It cannot b...

Page 34: ...e Replication wizard to configure mappings to virtual tape devices and NAS shares Seeding the Replication Target Appliance Data deduplication ensures that the amount of data to be replicated is minimized and the impact on network traffic is negligible However the benefits of deduplication apply only after the first full backup The first synchronization of the files NAS shares or cartridges virtual...

Page 35: ...en TCP ports 9387 Command protocol and 9388 Data protocol to allow the replication traffic to pass to and from the HP StoreOnce Backup Systems If necessary and the D2Ds conflict with another device on the network these ports can be changed from the defaults when you create the replication pairing or from the Local Appliance s General Settings page Recovering a Source Appliance NOTE During a replic...

Page 36: ...s missing the mapping may or may not still exist When the source appliance and host servers are lost and both are replaced the Recovery wizard can be used to repopulate a replacement source appliance NOTE This option can use reverse seeding to speed up data recovery to the StoreOnce Backup System This would be a USB disk if you are recovering to a NAS share 2 Target Promotion Figure 15 Target prom...

Page 37: ... this mode the mapping is broken to promote the share or library IMPORTANT This configuration is possible in only limited situations because it will break any other backups or replications to the Data Center appliance In a simple Active Passive scenario however this may be a very good model IMPORTANT If you replace the disks in the source StoreOnce Backup System and keep the original appliance the...

Page 38: ... still take place which may have a slight impact on backup performance Why is housekeeping important Housekeeping is an important process in order to maximize the deduplication efficiency of the appliance and as such it is important to ensure that it has enough time to complete Running backup restore tape offload and replication operations with no break i e 24 hours a day will result in housekeepi...

Page 39: ...figure blackout windows so that it does not interfere with backup and replication jobs See the HP StoreOnce Backup System user guide for details For more information This is a complex subject We strongly recommend that you read HP D2D Backup Systems best practices for VTL NAS and Replication implementations that is available on http www hp com support manuals for more detailed guidance For more in...

Page 40: ...If all backups can run in parallel there is an overall aggregate performance increase and if they finish within a few minutes of each other the impact of housekeeping from the backup jobs will be minimized Set replication blackout windows so that replication does not overlap with backup and housekeeping Set housekeeping blackout windows so that housekeeping does not overlap with backup and replica...

Page 41: ...To be avoided Do not delete and eject cartridges Avoid housekeeping jobs during backup Avoid running a lot of small incremental backups To be avoided 41 ...

Page 42: ...odes and two 12 disk array controllers one for each node Additional 12 disk storage shelves may be purchased and connected to the couplet up to three pairs of storage shelves may be connected to the disk array controllers D Deduplication Data deduplication compares blocks of data being written to the backup device with data blocks previously stored on the device If duplicate data is found a pointe...

Page 43: ... termed housekeeping runs on the appliance as a background operation it runs on a per cartridge and NAS file basis and will run as soon as the cartridge is unloaded and returned to its storage slot or a NAS file has completed writing and has been closed by the appliance I iSCSI not available with B6000 On an Ethernet network the HP StoreOnce Backup System is configured as an iSCSI device This mean...

Page 44: ...continues to operate correctly The HP B6000 and D2D4100 4300 Series Backup System is a RAID 6 device which offers the best combination of data protection and capacity for disk arrays It provides protection against double disk failures and failures while a single disk is rebuilding HP D2D4100 and 4300 Series Backup Systems also have an online spare disk in the appliance itself and on each expansion...

Page 45: ... remains on the G1 HP Backup System exported data is removed from it but can be imported easily when required These functions are grouped under the Tape Attach section of the Web Interface B6000 and G2 HP StoreOnce Backup Systems do not support direct export to tape Tape rotation strategies Tape rotation strategies determine when backups are run the number of cartridges that are required and how t...

Page 46: ...guide there are separate guides for the HP B6000 multi node product and the G2 G1 single node products You can find these documents from the Manuals page of the HP Business Support Center website http www hp com support manuals In the Storage section click Storage Solutions and then select your product Document conventions and symbols Table 13 Document conventions Element Convention Cross referenc...

Page 47: ...her a repair can be accomplished by CSR For more information about CSR contact your local service provider For North America see the CSR website http www hp com go selfrepair Registering your HP StoreOnce Backup System Once you have installed and tested your HP StoreOnce Backup System please take a few minutes to register your product You can register via the web http www register hp com To ensure...

Page 48: ...ww hp com support manuals http www hp com support downloads Documentation feedback HP welcomes your feedback To make comments and suggestions about product documentation please send a message to storagedocs feedback hp com All submissions become the property of HP 48 ...

Page 49: ... E series Library 1 1 F failover 8 fan in 26 fan out 25 fibre channel topologies 13 zoning 14 firewalls with replication 23 35 flexible emulation 13 performance vs dedupe ratio 40 G glossary 42 H help obtaining 47 host definition 10 14 fibre channel 13 housekeeping 38 HP technical support 47 HP 1x8 G2 Autoloader 1 1 I initiating replication 23 J job concurrency replication 29 K key processes 40 L ...

Page 50: ...on process 34 replication source library 24 replication target library 24 restore replication 35 S seeding for replication 34 service set 8 source appliance 24 source appliance permissions 23 StoreOnce Backup System definition 5 StoreOnce key processes 40 Subscriber s Choice HP 47 symbols in text 46 T tape rotation example 20 target appliance 24 technical support HP 47 service locator website 47 t...

Reviews:

No comments

Related manuals for D2D2503i

Brand: Panasonic Pages: 18

Brand: Alcad Pages: 6

Brand: Lancom Pages: 102

Brand: vPipes Pages: 3

Brand: SurfControl Pages: 2

Brand: Selta Pages: 83

Optane Persistent Memory 200 Series

Brand: Supermicro Pages: 54

CNPS11X Extreme

Brand: ZALMAN Pages: 12

PCM-4141 Series

Brand: Advantech Pages: 96

Brand: Powerleap Pages: 28

Brand: IBM Pages: 130

Brand: Telestar Pages: 16

Brand: Intel Pages: 51

Blend PRO BP-2002

Brand: FOLSOM Pages: 89

Broadband 700 MHz

Brand: Alcatel-Lucent Pages: 8

Brand: ICP DAS USA Pages: 8

Brand: Belkin Pages: 2

RouterBOARD RB711-5HnD

Brand: MikroTik Pages: 2

Brands by name

0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Popular brands

Load more brands