Lexicon Big Data Mining for Enhanced Security

A brief paper on Honeycomb Lexicon’s Big Data Mining feature.

Lexicon Big Data Mining for
Enhanced Security
Got Data? And Lots Of It?
Why not get the most out of it and gain the
business intelligence your organization deserves?
Lexicon Big Data Mining – What Is It?
Honeycomb’s Lexicon Big Data Mining reads in very large amounts of arbitrary data from multiple sources, classifies it,
archives it, and indexes it – then allows you to easily search through it, perform pattern analysis, look for correlations
between disparate data types, and maps your data to known vulnerability and attack surface data stores.
How Does It Work?
Lexicon Big Data Mining starts by trawling through your mass of data – be
it historical logs,live feeds,archives,disk dumps of arbitrary information,
custom file structures,you name it – Lexicon will read it in and classify it.
Once classified, it is then piped through Lexicon’s indexing engine, which
extracts, stores and indexes the data.
This allows you, as well as Lexicon’s Analytics engines,and third party
SIEMs to search through some/most/all of the data, pulling out interesting
trends, mapping suspicious and malicious patterns, and plotting
behavioural models and anomalies.
Great, So How Does This Help Me?
Now that Lexicon Big Data has read-in and performed initial classification of your data, the real gains can be realized.
Lexicon Big Data can now perform many useful tasks on stored data, including:
• Find correlations between disparate sections of data
• Link events and external knowledge stores (e.g. lists of known malware web sites)
• Seek out patterns of behaviour across multiple data sets and types
• Forward specific events types to 3rd party products and SIEMs
• Generate alarms that can notify security staff, launch programs, forward to 3rd party tools/SIEMs
• Produce and publish Reports and Summaries on an ad-hoc and scheduled basis
Lexicon Big Data also archives all unfiltered data, allowing it to be stored and searched offline, backed-up, merged with
new data etc. for unlimited data retention.
© Copyright 2015 Honeycomb Technologies Ltd. All Rights Reserved.
What Sort Of Data Can Lexicon Big Data Consume?
One of the great advantages of Lexicon Big Data is its ability to handle any kind of data – Lexicon Big Data will scan
classify and read-in any type of data, which can then be searched regardless of its origin. This agnostic approach to data
allows you to correlate and build patterns across diverse data types, without having to know what those types are
Lexicon Big Data also includes a list of classification data types it recognizes to provide further processing.
Here is a list of just some of these:
• Cisco Firewall Switch and VPN
• Palo Alto Firewall
• McAfee Security Manager Events
• Juniper Networks and SSL VPN
• F5
• Tenable Nessus Scans
• Web Filters and Proxies
• SonicWal
• Pirean
• Windows Event Logs
• BlueCoat
• Norse Security
• Exchange Message Tracking
• Lexicon File Integrity Monitoring
• Honeywell Door Entry Systems
• Watchguard
• Videx Biometrics
• Apache Web Server
• VMWare
• Genetec Surveillance
What About Unknown/Unstructured Data?
Even data that is not in a standard format, or in some unformatted or custom
state, Lexicon Big Data can still read this data in, and make it searchable in
the same way as well-known data sources such as those listed above
This brings visibility of valuable information that traditional SIEMs and Log
Management products simply aren’t geared up for handling. Lexicon Big Data
can give this type of information context and meaning, and can forward the
data and/or associated alarms to 3rd Party SIEMs so they can make use of
precious information they would otherwise ignore.
System Requirements
The amount of system resources needed will of course depend on the amount of data being processed, how much will be
filtered/retained, how long data is to be retained etc.
Here are some useful guidelines of system resources for a typical Data Mining environment:
Per 1TB of uncompressed data:
2x Lexicon Big Data Servers (can be virtual)
Each Lexicon Server should have:
• 4-16 Cores CPU
• 48-192GB RAM
• Minimum 800GB disk storage (LAS e.g. 10-15k SAS)
Based on 1TB uncompressed data, with 100% retention (no filtering), and unlimited retention period.
It is recommended each system include a separate Analytics Engine instance for use in analysis, reports, alerting etc.
(Analytics Engine software is included)
© Copyright 2015 Honeycomb Technologies Ltd. All Rights Reserved.
© Copyright 2015 Honeycomb Technologies Ltd. All Rights Reserved.