The Evolution of Flow Data
I’ve been working with flow technologies for close to a decade…one might say I’m a little obsessed. What impresses me most about flow technology is how long it has remained relevant. It has been over two decades since NetFlow’s creation, vendors continue to innovate with it, developers still build products to collect it, and network engineers and security professionals still rely on it. Of particular interest to the ARIN community, with the most recent version you can now collect statistics on your IPv6 traffic which is useful for planning your migration, tracking which services are running on IPv6 / IPv4, comparing traffic volumes, and monitoring your network.
With this in mind, I want to pass on some knowledge, tips, and a history about the protocol.
What is Flow Data?
The origin story is traced backed to Cisco Systems in the mid 90’s with the creation of NetFlow. Since then, ‘NetFlow’ has become an umbrella term that encompasses a variety of different iterations of the protocol (NetFlow v5, v9, IPFIX, Flexible NetFlow, NetStream, etc.). To determine which version a vendor has implemented, it is best to read the manual or perform a Google search.
One way to understand use cases for flow data is to draw a comparison to crime shows like Law and Order. When police start an investigation, they often examine the suspect’s phone records as a first step. Who did they call? What time was it? How long did the call last? Answering these questions helps the investigators find new leads and build a case.
Flow data is a phone record for network traffic. Collecting this data allows the operator to be the detective investigating the who, what, where and when… pretty cool right?
A more technical explanation is that the network equipment sends flow packets over UDP. The flows are sent to a collector where they are stored and visualized. Flow collection is all about observation points. In the image below, the entire network is producing flows, giving the network operator complete visibility.
Flow data from entire network
If we take the same network and only enable flow data from the router, we can see a difference. In this example we see an Employee communicating both with a LAN side resource (the green envelope) and with the Internet (the blue envelope). When the Internet bound communication crosses the router, flow data is generated, and the LAN side communication is never seen by the collector. Understanding this concept is critical when deciding which equipment to collect data from.
Reducing flow exports reduces visibility
There are plenty of options to consider when selecting a flow collector. Commercial solutions, free solutions, and open-source solutions are all available. Team Cymru even provides a free cloud-based collector for ISPs and Hosting Providers called NimbusTM.
Data Enrichment
Some of the best ways to maximize the value of the flow data is to enrich it with metadata. Country or Autonomous System lookups are a great way to add context around where conversations are taking place. Generating alerts by comparing flow data with reputation feeds is a popular for threat reconnaissance.
Other methods include grouping by subnet, defining application ports, or using algorithms to generate alerts. Once you start experimenting with the technology it becomes apparent there is tons of room for creativity.
The example below demonstrates grouping the IP information by Autonomous System. Taking this step provides a nice dashboard visualization and unlocks the ability to compare peer vs transit traffic.
NimbusTM Autonomous System Report
Flow Evolution – IPv6 monitoring
The most positive changes to flow data came with the introduction of templates in NetFlow v9. Where v5 was a fixed format, v9 provided the flexibility to choose fields. This change also paved the way for the IETF standard IP Flow Information Export (IPFIX).
With the ability to customize the data, came statistics on IPv6 traffic. Adopters of IPv6 could now easily compare IPv6 / IPv4 traffic volumes, use reports to plan IPv6 migrations, and track what services were running on each protocol. In a v5 world that wasn’t possible, so that is why I find the change so significant.
There are plenty of other doors that template-based flow data unlocked. Some of my favorite examples include layer 7 application attribution, performance metrics like latency or jitter, and the recent move by SD-WAN vendors to include elements that visualize how traffic is traversing the mesh. All good things.
Summary
Stay curious about what is possible. Flow data is often thought about as a simple tool that helps with bandwidth monitoring – but it’s so much more. The protocol is extremely rich, and vendors are continuing to push the limits of what’s possible. It’s worth your time to keep an eye on it, especially as you prepare for your IPv6 transition! Please don’t hesitate to contact me at Team Cymru with any questions.
Any views, positions, statements, or opinions of a guest blog post are those of the author alone and do not represent those of ARIN. ARIN does not guarantee the accuracy, completeness, or validity of any claims or statements, nor shall ARIN be liable for any representations, omissions, or errors contained in a guest blog post.
Recent blogs categorized under: IPv6
GET THE LATEST!
Sign up to receive the latest news about ARIN and the most pressing issues facing the Internet community.
SIGN ME UP →Blog Categories
Grant Program • Tips • RPKI • Updates • IPv6 • Business Case for IPv6 • Internet Governance • Public Policy • Elections • ARIN Bits • Fellowship Program • Caribbean • Outreach • Training • IPv4 • Security • Data Accuracy • Customer Feedback • IRR