Skip to main content

Hardware recommendations

Experian Pandora is a self-contained client/server product which runs on most java-compliant operating systems on commodity hardware. It takes full advantage of 64-bit architectures, and is multi-threaded and linearly scalable.

Experian Pandora can be installed as a full client/server or multi-user installation, please view our Installation Guide once you have acquired the relevant hardware for your business needs.

Virus Checkers

We strongly recommend that you ensure that any anti-virus software installed does not check the directories containing Experian Pandora data files, and any system sweeps are scheduled to run outside of office hours, or periods where data loading will occur. This is to avoid any performance drops that might happen during critical operations.

System recommendations

These are recommendations and as such should be utilised as a reference point, all specifications provided don't have to be followed to the letter but rather used as a way to determine the type of component needed to obtain an acceptable level of performance for the range of business uses in question. 

They will be provided in four types based on common usage levels: Minimum, Small, Medium and Large workloads.

Minimum requirements

Important: These are the minimum specifications that will allow you to make use of Experian Pandora; however, we don't recommend running Experian Pandora as we're unable to guarantee that you will see acceptable performance unless you're a single user running the product on a single machine. 

Minimum server

ComponentMinimum
Operating System (OS)  Windows 64-bit (7, 8.1 and 10, Server 2003, 2008 and 2012), Open SUSE Linux 64-bit, Red Hat Linux 64-bit, Solaris 11, HPUX 11 and AIX 64-bit
Processor (CPU)

Intel Core i3 2ghz Dual Core (Intel i3 6100) 1 User

Find out more

Memory (RAM) 8GB available for Experian Pandora and OS (any running 1333Mhz)
Disk (HDD, SSD etc) 7,200rpm HDD for OS and any SSD for Experian Pandora
Network Card Gigabit Ethernet

Minimum client

ComponentMinimum
Operating System (OS) Windows 32-bit or 64-bit (7, 8.1 and 10)
Processor (CPU) Intel Core i3 2ghz Dual Core (Intel i3 6100)
Memory (RAM) 1GB
Disk (HDD, SSD etc) Any HDD ~300Mb free space for client install
Network Card Gigabit Ethernet

Small workload

This would typically be for 1-3 or 1-4 users working with up to ~300 million rows on a monthly basis.

Server recommendation

ComponentRecommended
Operating System (OS) Windows 64-bit (7, 8.1 and 10, Server 2003, 2008 and 2012), Open SUSE Linux 64-bit, Red Hat Linux 64-bit, Solaris 11, HPUX 11 and AIX 64-bit
Processor (CPU) Intel Core i7 4.0ghz Quad Core (Intel i7 6700K) 1-3 Users Intel Xeon 3.5ghz Six Core (Xeon E5-1650) 1-4 Users
Memory (RAM) 16GB available for Experian Pandora and OS (Any running 1600Mhz)
Disk (HDD, SSD etc) 7,200rpm HDD for OS and SAS Drives or SSDs for Experian Pandora
Network Card Gigabit Ethernet

Client Recommendation

ComponentRecommended
Operating System (OS) Windows 32-bit or 64-bit (7, 8.1 and 10)
Processor (CPU) Intel Core i3 2ghz Dual Core (Intel i3 6100)
Memory (RAM) 4GB
Disk (HDD, SSD etc) Any HDD ~300MB free space for client install
Network Card Gigabit Ethernet

Medium workload

This would typically be for 3-6 users working with up to ~1 billion rows on a monthly basis.

Server recommendation

ComponentRecommended
Operating System (OS) Windows 64-bit (Server 2008 and 2012), Open SUSE Linux 64-bit, Red Hat Linux 64-bit, Solaris 11, HPUX 11 and AIX 64-bit
Processor (CPU) Intel Xeon 2.6ghz Eight Core (Intel Xeon E5-2640) 3-6 Users Dual Intel Xeon 3.5Ghz Six Core (2x Intel Xeon E5-1650) 3-6 Users
Memory (RAM) 32GB available for Experian Pandora and OS (any running 2133mhz)
Disk (HDD, SSD etc) Multiple SAS Drives or Enterprise SSDs setup with Experian Pandora on a separate drive to the OS
Network Card Gigabit Ethernet

Client recommendation

ComponentRecommended
Operating System (OS) Windows 32-bit or 64-bit (7, 8.1 and 10)
Processor (CPU) Intel Core i3 2ghz Dual Core (Intel i3 6100)
Memory (RAM) 4GB
Disk (HDD, SSD etc) Any HDD ~300Mb free space for client install
Network Card Gigabit Ethernet

Large workload

This would typically be for 5 or more users working with more than 1 billion rows on a monthly basis.

Server recommendation

ComponentRecommended
Operating System (OS) Windows 64-bit (Server 2012), Open SUSE Linux 64-bit, Red Hat Linux 64-bit, Solaris 11, HPUX 11 and AIX 64-bit
Processor (CPU) Dual Intel Xeon 2.6ghz Eight Core (2x Intel Xeon E5-2640) 5+ Users
Memory (RAM) 120GB available for Experian Pandora and OS (any running 2133mhz)
Disk (HDD, SSD etc) 4 x 400GB 12G SAS SSDs (RAID 0, for Experian Pandora data), total of around 1.5TB 2 x 300GB 6G SAS 15k rpm HDDs (mirrored, for OS), partitioned to 60GB (OS), 160GB (Temp)
Network Card Gigabit Ethernet

Client recommendation

ComponentRecommended
Operating System (OS) Windows 32-bit or 64-bit (7, 8.1 and 10)
Processor (CPU) Intel Core i3 2ghz Dual Core (Intel i3 6100)
Memory (RAM) 4GB
Disk (HDD, SSD etc) Any HDD ~300Mb free space for client install
Network Card Gigabit Ethernet

Storage devices and services

You will commonly hear or be concerned about the following terms with storage for Experian Pandora:

  • IOPS - how often or fast the storage device can perform I/O requests. IOPS from Experian Pandora standpoint isn’t a particularly useful measure due to a lot of random access to the repository. It depends on too many parameters to be practically useful – the actual data, the rules, the number of users, the volumes and desired processing times, the operations carried out, etc.
  • latency - how long it takes for an I/O request to begin, measured in ms (milliseconds)
  • throughput - the actual speed of the data transfer, most often measured in MB/s (megabytes per second)

Comparison table

TypeAverage throughput in MB/sAverage latency in ms (read/write)
HDD 150 – 600 <9 / <5
SSD 500 – 2400 0.5 / 0.1
SAS HDD 300 – 1500 <9 / <0.9

See below for further information on some of the storage types available.

HDD (Hard Disk Drive)

Standard hard disk drives are not something we recommend, this is purely from a performance standpoint. The only time you’d be happy with one is if you do very little work that is not of a time sensitive or critical nature. These typically come with 5400rpm or 7200rpm, which dictates how quick they can retrieve data on their disks.

SSD (Solid State Drive)

Standard solid state drives are again recommended for light workloads due to the finite lifespan. We do recommend Enterprise level SSDs as they provide the performance of a standard SSD but have the purpose built life span required for a server based environment and can deal with both high volume and time sensitive data.

Expect to see average performances of 500MB/s – 2400MB/s throughput with read/write latencies (response times) consistently below 0.5ms and 0.1ms respectively.

SAS HDD (Serial Attached SCSI Hard Disk Drive)

These are basically Enterprise level HDDs that are developed with 24/7 uptime in mind and typically come in 10k or 15k rpm speeds. We recommend these for systems that Enterprise SSDs are not convenient for. The 15k SAS HDDs will provide you a vastly improved performance over normal consumer HDDs. This includes being built with servers and high workloads in mind.

Expect to see average performances of 300MB/s – 1500MB/s throughput with read/write latencies roughly in the figure of < 9ms and < 0.9ms respectively. 

SAN (Storage Area Network)

This allows each server to access shared storage as if it were a drive directly attached to the server. Unfortunately, due the many variables in creating a SAN we can’t give conclusive performance results. Something to be aware of for reference though is as follows:

A SAN is not a monolithic entity. If you take a switch-based Fibre-Channel SAN as an example, the I/O path that is considered part of the SAN starts from the host bus adapters (HBA) all the way to the eventual disk media on the disk array. Hardware components on this I/O path typically include switches, inter-switch links, front-side adapters, disk array cache and processors, disk controllers, and disk drive media. In addition, there are layers of software on this path, including various drivers, firmware, and APIs. Every single component on this I/O path has the potential to significantly alter the performance characteristics of a drive presented from the SAN.

The above is a statement from the blog athttp://www.sqlteam.com/article/which-is-faster-san-or-directly-attached-storage#sthash.RRvlQT9W.dpuf

We share similar views and based on that we are unable to recommend SANs because their implementation involves too many variables, many of which might be completely out of the hands of us or end-users.

AWS (Amazon Web Services)

Amazon Web Services recommendations would be the following instances;

  • Standard usage we recommend any of the D2 instance types, more information can be found here.
  • For maximum performance we would advise the use of any I2 instance, more information can be found here.

Which of the individual instance types you need would then be based on the number of users and how large the dataset is.

Azure (Microsoft Cloud Solution)

Microsoft Azure recommendations would be the following packages:

  • Standard usage we recommend packages D13v2 or above of the standard tier, not basic tier. More information can be found here.
  • For maximum performance though we would advise package DS13 and above or a GS Instance along with the required premium storage quantity for your needs. You can find information regarding the packages here and information on the premium storage here.

Virtual machine considerations

Experian Pandora can be happily run on virtual machines with smaller datasets. Anything for larger datasets or intensive work it’s worth noting that certain things need to be taken into account if you hope to get maximum performance from its full capabilities.

Note: Huge datasets should be worked from a completely dedicated server and not a virtual machine.

  • The VM must be configured with a fixed MAC address and not a dynamically set one, otherwise the licence will stop working due to its MAC requirements.
  • It should be allocated with CPUs that are dedicated to it. This makes sure full performance is available to Experian Pandora rather than it getting potentially lower performance that comes with having the CPUs resources being shared across multiple VMs or applications on the server.
  • Directly attached storage for the VM if possible. Depending on your VM configuration this will be unlikely. Most VMs are used as quick builds based on sharing hardware resources. This can have negative impacts on performance for a database storage engine like Experian Pandora. It can be slower but more importantly it allows for other applications on the server to hinder performance.
  • If using a VM/SAN setup, it’s advisable to double the recommended memory, this enables more memory allocation for caching that helps to offset the SAN. As stated above however we do not recommend SANs.

Performance comparisions

Speed tests

Speed tests are measured with HD Tachin 8MB zones

  • Storage Burst Speed – how quick the hard drive/storage is at the equivalent of sprinting for a short period of time.
  • Storage Avg. Speed – the average speed with which the storage can transfer data.
  • Storage Response – the latency of the storage.

Dataset tests

Customer Record and Statistical datasets are publicly available datasets we have manipulated for use in our performance testing to cover a varying set of data types. Customer Record is ~800k rows with 99 columns and Statistical is ~120mil rows with 29 columns. These are available on request if you wish us to advise you on how to run your own comparison tests.

  • Customer Record Load time – measuring the average time it takes for Experian Pandora to load the Customer Record dataset.
  • Customer Record Dependency Analysis – measuring the average time it takes for Experian Pandora to discover dependencies based on a scenario of 30 columns to 3 levels at a threshold of 98% of the Customer Record dataset.
  • Customer Record Quality Report – measuring the average time it takes for Experian Pandora to generate the Quality Report based on the Customer Record dataset.
  • Customer Record Validation – measuring the average time it takes for Experian Pandora to validate a single rule against a table lookup expression that returns all values.
  • Customer Record Linking – measuring the average time it takes for Experian Pandora to perform a filter based on linking of tables both using the Customer Record dataset.
  • Statistical Load time – measuring the average time it takes for Experian Pandora to load the Statistical dataset.
  • Statistical Dependency Analysis – measuring the average time it takes for Experian Pandora to discover dependencies based on a scenario of 6 columns to 3 levels at a threshold of 98% of the Statistical dataset.
  • Statistical Quality Report – measuring the average time it takes for Experian Pandora to generate the Quality Report based on the Statistical dataset.
  • The last row is to advise on how many CPU cores each Instance has and what AWS storage type was used.

AWS benchmarks

Test Performed Amazon Web Services EC2 Instances

 

Physical PC
D2.xlarge D2.xlarge D2.2xlarge I2.xlarge I2.2xlargeI2.2xlargen/a

Storage Burst Speed

246.6mb/s

275.3mb/s

322.8mb/s

415.5mb/s

412.7mb/s

214.4mb/s

359.7mb/s

Storage Avg. Speed

148.7mb/s

117.7mb/s

142.1mb/s

319.5mb/s

313.9mb/s

129.2mb/s

308.2mb/s

Storage Response

12.0ms

0.2ms

11.4ms

0.1ms

0.1ms

0.3ms

0.1ms

 

 

Customer Record Load time

00:05:19

00:05:18

00:03:02

00:06:00

00:03:12

00:03:11

00:02:35

Customer Record Dependency Analysis

00:18:43

00:14:20

00:13:20

00:15:45

00:08:46

00:08:51

00:07:33

Customer Record Quality Report

00:04:53

00:05:35

00:04:13

00:07:09

00:05:29

00:05:12

00:03:19

Customer Record Validation

00:07:35

00:07:07

00:05:47

00:09:01

00:06:30

00:06:24

00:05:06

 

 

Statistical Load time

00:45:57

00:45:33

00:31:25

00:50:12

00:32:34

00:32:19

00:26:02

Statistical Dependency Analysis

00:19:26

00:18:06

00:13:32

00:25:10

00:13:17

00:13:05

00:11:33

Statistical Quality Report

00:00:10

00:00:11

00:00:06

00:00:10

00:00:06

00:00:11

00:00:06

 

 

CPU Cores + Storage Type

4 Cores

Standard

4 Cores

EBS

8 Cores

Standard

4 Cores

Standard

8 Cores

Standard

8 Cores

EBS*

8 Cores

Consumer SSD

Note: times are in the format hh:mm:ss

*This is just using EBS drives, it does not include enabling the EBS optimisation option which AWS include with the D2 instances but charge extra for with I2 instances.

Based on the above results we can see a general trend in regards to more powerful setups provide better timeframes but there are still some that don’t follow this pattern, this is due to the virtualised nature of cloud based systems and is not something you can always account for or predict.

Azure Benchmarks 

Test Performed Azure Cloud Storage Instances

 

Physical PC
D12v2DS12 v2D13v2G2G3GS3n/a

Storage Burst Speed

42.9mb/s

1856.7mb/s

15.2mb/s

21.5mb/s

20.3mb/s

2352.1mb/s

359.7mb/s

Storage Avg. Speed

23.7mb/s

139.5mb/s

13.0mb/s

13.9mb/s

13.5mb/s

1342.6mb/s

308.2mb/s

Storage Response

4.2ms

0.2ms

5.1ms

5.3ms

6.2ms

0.1ms

0.1ms

 

 

Customer Record Load time

00:04:13

00:04:11

00:02:55

00:03:32

00:02:47

00:02:48

00:02:35

Customer Record Dependency Analysis

00:18:11

00:13:01

00:13:58

00:20:25

00:14:47

00:07:18

00:07:33

Customer Record Quality Report

00:04:00

00:03:57

00:04:15

00:03:48

00:03:38

00:03:54

00:03:19

Customer Record Validation

00:05:46

00:05:46

00:06:24

00:05:16

00:05:23

00:05:23

00:05:06

 

 

Statistical Load time

00:52:51

00:34:29

01:08:47

03:12:31

00:28:24

00:28:01

00:26:02

Statistical Dependency Analysis

00:21:10

00:16:42

00:14:13

00:15:47

00:13:47

00:13:14

00:11:33

Statistical Quality Report

00:01:12

00:00:12

00:02:01

00:01:50

00:00:06

00:00:06

00:00:06

 

 

CPU Cores + Storage Type

4 Cores

Standard

4 Cores

Premium

8 Cores

Standard

4 Cores

Standard

8 Cores

Standard

8 Cores

Premium

8 Cores

Consumer SSD

Note: times are in the format hh:mm:ss

Based on the above results we can see a general trend in regards to more powerful setups provide better timeframes but there are still some that don’t follow this pattern, this is due to the virtualised nature of cloud based systems and is not something you can always account for or predict.

Copyright ©, 2014-2017. All rights reserved.