Raymond Russell, Chief Technology Officer & Co-Founder, Corvil
Modern trading systems operate at inhumanly fast speeds, but their behavior is difficult to predict and manage and, when they fail, they can fail catastrophically. Given the speed, complexity, and unpredictability of trading systems, the only way to manage them effectively is with other machines. Monitoring systems have been around for decades, based on periodically collecting raw counters from computers and network devices. However this approach cannot begin to provide the required insight or control into complex systems. For example, a key factor in the performance of trading system is their latency, but simply counting that a system executed 10,000 transactions in the last minute tells you nothing about how long any of them took.
The only way to get a real handle on the behavior and performance of such complex systems is to capture every transaction as it happens, with an accurate record of the time at which each message effecting the transaction was exchanged. The best way to do this turns out to be on the networks that connect all the trading systems to each other; the network can copy data in flight across it without disturbing or delaying it, and we can use commodity network capture cards to capture it and timestamp it with nanosecond precision.
A huge additional boon to capturing on the network is that the network itself is a key component of the trading system, and the performance of the latter depends intimately on it. Often latency arises from the interaction of software with the network, not from either component separately, and the only way to detect and resolve the problem is by capturing software messages in the context of the network connections that carry them.
We have therefore developed machines that tap the network and passively capture all the trading activity taking place on it, constantly analyzing in real time the behavior of the trading systems and the networks that connect them.
But, the performance of their systems is not the most pressing concern for CIOs in the trading business these days. The stability and safety is a much larger concern, in particular the levels of risk that the trading machines incur. This risk comprises both traditional trading dimensions (How big are the trading positions I have established? How much would it cost me to unwind them?) and technological dimensions (How likely is it that my trading systems misbehaves? What controls do I have in place to catch this?).
These concerns place a whole new level of demands on the machines that watch the network: they must also be able to interpret all of the trading messages that are exchanged over it so that they can not just stopwatch the trading systems, but also understand their full behavior. Part of their remit is the ability to capture and provide a complete audit trail of the state of the business at any time, and they must also be able to analyze the trading behavior, to detect and alert on dangerous trends. For example, these systems can capture buy or sell instructions, identify exactly when they were sent, calculate the additional exposure the business takes on by having the orders in the market. They can also track all fills or partial fills against these orders, and update the aggregate trading position in real time.
Lessons for other industries
These are all fascinating developments but, as a CIO in a non-trading business, you may wonder what relevance they hold for you? The same basic principles are at play in all modern businesses: the success of your business is dependent on the performance of the IT infrastructure that it runs on. You may not need the same ultra-low latency response times that electronic trading demands, but think about something as simple as a phone call: clear reliable voice calls are essential, whether for dealing with your customers or for internal communications between your employees. Today, most voice calls are routed as VoIP (voice over IP) across shared data networks, where insufficient bandwidth or competing data transfers can cause dropped calls or garbled voices. Without the kind of visibility we discussed above, voice quality issues can persist for weeks or months without resolution, compromising your customer service levels and employee productivity.
The complexity and unpredictability that are rife in trading systems are far from unique to them. For example, modern e-commerce platforms are just as complex, with multiple distributed systems that all need to perform in concert. The difficulty that your shoppers have in completing their purchases on your website may be due to your webserver, your database, or your NAS; they might be due to problems with their bank's credit-card systems, or it might just be due to the shopper's poor internet connectivity. In all cases, you need to be alerted to the problem, know where the problem lies, and take the appropriate steps to fix it - all in real time.
All business today is becoming more and more dependent on software and machines: more of your own business is being automated, and your clients and counterparties are increasingly forcing you to interact with them electronically. As such, you are faced with challenges and risks similar analogous to those in electronic trading: "How do I know that my business is behaving as expected? How do I know that my machines are executing transactions safely and reliably? How can I detect and prevent rogue or fraudulent transactions?"
You can however also benefit from the same approaches that have been developed in electronic trading; look for systems that:
● Attach passively or non-intrusively into your IT environment, ideally via tapping the network
● Analyze both the network and all the application flows that traverse it
● Can be flexibly extended to analyze all the applications and protocols that your environment hosts;
● Analyze and alert in real-time, but also provide effective summarization for dashboarding and reporting;
● Present the data and analytics natively, and also feed them to other big-data and operational systems.