• The solution shall support role-based access control and shall integrate with OFR's existing Microsoft Active Directory authentication/authorization infrastructure securely (e.g., encrypted w/Kerberos)
• The solution shall run on a system or systems whose operating system is either Red Hat Enterprise Linux Server 6 or Microsoft Windows Server 2012.
• The solution shall support a system that runs either on physical hardware or as a virtual machine on a VMWare hypervisor
• The solution shall support storage of data in a compressed format (e.g., GZIP, ZIP, Snappy, etc).
• The solution shall support automated updates of data and shall include either an API or a programmatic ability to update supported datasets as new data arrives on a periodic basis to allow the solution to integrate with OFR's custom-built, linux & ruby-based automated data ingestion tool.
• The solution shall support standard remote SQL queries via ODBC or JDBC.
• The solution shall include the ability to create graphs, charts, and reports from raw or analyzed data.
• The solution shall support standard analytic functions for stock market domain-specific data.
• The solution shall include the facility to extract subsets of data.
Preferred Functionality:
• The solution should support integration with OFR's existing secure (kerberized) Cloudera Hadoop implementation.
• The solution should use OFR's existing secure (kerberized) Cloudera Hadoop implementation as a data storage backend.
• The solution should use OFR's existing secure (kerberized) Cloudera Hadoop implementation as a SQL-storage backend (e.g., HIVE/Impala).
• The solution should use OFR's existing secure (kerberized) Cloudera Hadoop implementation for analytics (YARN, Spark, MapReduce, etc)
• The solution should allow auditing of user-access to data stored or presented by the solution. This audit data should be written to a standard logging facility (e.g., syslog, log4j, etc) that can be ingested by OFR's existing log analysis tool, Splunk. Auditing should be configurable so only certain subsets of data presented by the solution are audited - either by only auditing those subsets or through a filtering component prior to logs being passed to Splunk
• The solution should include a mechanism to share stored procedures, cleansing, analysis, charts, etc between users. E.g., user 1 develops a workflow and user 2 can leverage that workflow as part of an expanded workflow.
• The solution should use a client-server model. The client piece should either be web-based or an application that can be virtualized with either Citrix XenApp or Microsoft AppV.
• Any RDBMS connectivity or required back-ends should support either Microsoft SQL Server (2012 or 2014) or Postgres EnterpriseDB (9.x).
• The system should support verification of ingested data updates - e.g., MD5SUM/SHASUM.
• The system should support partitioning/parallelization of analytics/queries. OFR would prefer to leverage its existing Hadoop infrastructure or High-Performance Computing infrastructure (batch-queued based on SLURM) for this functionality.
Required Tick Database User Functionality:
• Platform-independent, performant data store
From the point of view of a SQL database, it should be simple and fast to fetch data using a query like
SELECT * FROM taq WHERE TICKER='IBM' and DATE='4-15-2015' and TIME >= ‘9:30:00' and TIME<='16:00:00'
This functionality should be available across the range of supported statistical packages including MATLAB, Python, R, Excel
• Data Cleaning
High -frequency data is often contaminated with errors - 0 prices or trade sizes or "spikes" where a single price or a block of prices are far away from the prices before or after. These require filtering for errors. These filters are typically implemented as leave-one-out moving estimators of standard deviation of mean-absolute deviation. A set of these filters is needed to utilize tick data.
• Aggregation
Tick data is vast in nature, often having > 1,000,000 observations per instrument per day (counting both quotes and trades). Aggregation using last-price interpolation, average price interpolation or HLOC bars using some pre-specified time window (e.g. every 5 seconds, 5 minutes, etc).
• Cross-market integration
Many interesting questions require accessing assets across many markets. For example, in the flash crash, it is necessary to access SPY from NYSE TAQ, ES from CME (via, e.g. Thompson-Reuters or CFTC data, if available) and Options data from OptionsMetrics (IvyDB). These should be accessible in a time-aligned manner so that an overall view of trading across these exchanges can be readily analyzed. Similar cross-market aggregation would have been needed in October 2014 where important assets trade on ICAP (BrokerTek) and CME (Thomson-Reuters or CFTC data, if available).
• Contract/Treasury rolling
Futures trade for a fixed amount of time, with contracts typically ending either monthly or quarterly. The vast majority of trading occurs in the front month (or quarter) contract, and so data is usually required from this futures contract. However, when as a contract expires it is necessary to "roll" to the next contract in a smooth manner so that prices or volumes are comparable. This requires knowledge of future contract structure and appropriate methods to splice the return together.
• Merge Quotes and Trades
Quotes and trades (transactions) are generally stored separately since they have different fields. It is common to merge these two databases where the merge will keep the last valid quote N seconds before the transaction, where N>=0. It is also useful to use the Lee & Ready algorithm to sign trades based on the side of the order book.
• Pull data directly into Excel
It should be easy to pull aggregated data straight in to an Excel book using a Macro which will auto update. A native API would be helpful, along with connections to frequently-used statistical software packages such as MATLAB, R, Stata, and SAS.
• User generated tables or views
Some mechanism to store cleaned data, especially if the data cleaning involves many rules and is slow.
• Portfolio Aggregates
Aggregating within the same database using pre-specified portfolio weights. For example, real-time construction of the S&P 500.
• Order Book Reconstruction/Level 2 data
This is hypothetical, but reconstructing the order book from the message stream.
• Integrate external reference data - e.g. CRSP
Integrating with external reference data used to track firms across mergers and acquisitions or ticker changes.
• News-driven market analysis
Pulling trades/quotes within some time window of a new event.
• Helpdesk Assistance/Support
Access to product support on questions related to functionality and query development with reasonable response times.
• Manuals / Documentation
Documentation of functions must exist such that new users can easily learn/ implement functions as needed.
RFI Response Instructions:
We anticipate responses will come in one of the following three general categories although we do not intend this list to limit responses or suggestions - alternatives are welcomed.
1. A commercial, off-the-shelf (COTS) software product that uses OFR's existing Hadoop infrastructure as its data storage/presentation backend.
2. A commercial, off-the-shelf (COTS) software product that is self-contained.
3. A custom-built solution that uses OFR's existing Hadoop infrastructure as its data storage/presentation backend.
RFI responses shall be in the form of a Capability Statement or White Paper, no more than 10 pages (including graphs and charts) in no less than an 11 point font, that addresses:
1. Commercial off the Shelf Software/Product Availability, ability to fulfill the above functionality. Describe any customization that would require fulfilling required functionality.
2. Company's profile and capability in providing these types of solution, if any.
3. Business type (i.e. large business, small business, small disadvantaged business, woman owned business, HUB-Zone small business etc.) based upon North American Industry Classification System (NAICS) code 541512 Computer Systems Design Services. Please refer to Federal Acquisition Regulation (FAR) Part 19 for additional detailed information on Small Business Size Standards. The FAR is available at http://www.acquisition.gov/comp/far/index.html
4. Past Performance History/Experience in providing this area of expertise and other relevant information that would enhance the Government's understanding of the information submitted.
5. Budgetary estimates to include pricing model for all services, hardware, and software licensing to implement proposed solution.
6. An invitation to demonstrate the solution may be requested from the companies responding to this RFI.