The demand on cloud providers to manage their clients’ big data has led to a need for gigabit speed connections and software that supports terabyte-scale electronic transfers. In addition to consumer and industry applications, there are several scientific applications such as CERN’s particle collider, SETI, and genomic sequencing that would benefit from a tool for transmitting terabytes of data. GPSonFlow constrains transmission to off-peak hours, thereby minimizing costs and network congestion. The off-peak hours occur during early morning hours when users are sleeping. These hours do not overlap between two geographically separated users in different time zones so data are transmittedto store-and-forward hops that are time proximal to the sender or receiver. The GPSonFlow tool finds geographic position and bandwidth of suitable hops and also computes a transmission schedule optimized for off-peak hours.
The unique feature of this tool is the ability to automatically construct the model and compute the optimal location and bandwidth of storage hops for fast and cheap transmission of terabytes of data from a sender’s network to a receiver’s. The transmission occurs during low congestion time taking advantage of unused bandwidth that would otherwise be wasted. The size of the model is dependent on the duration of the transfer. This model can be used to generate transmission via data centers and via a crowd of client machines. The model is automatically constructed from a minimal number of input parameters. GPSonFlow is the only tool that maps graph-time to real-time (UTC time zones) and integrates this mapping into graph construction and graph search algorithms. While the model construction algorithm has low complexity, the search algorithm that computes the transmission schedule has high complexity. A search algorithm with lower complexity is being developed.
Currently, shipping portable storage devices is the cheapest and fastest method for transferring terabyte-scale data. Examples are Amazon’s snowball, Microsoft’s Azure, and Google’s cloud platform. Both industry and academia are analyzing and developing hardware/software for large transfers. In the future, it is expected that with a wider deployment of gigabit connections, electronic terabyte-scale transfers will be available to internet users. In the future, what is not expected to change is the shared nature of the internet and the scarcity of bandwidth. Since terabyte-scale transfers are bandwidth intensive, a transfer tool such as GPSonFlow, that ensures all transmissions occur during low congestion times when there is ample unused bandwidth, would be fast and cost effective.
- Large data sets are transmitted from sender to receiver indirectly via data centers or client machines that serve as store-and-forward hops
- Lowers the cost of large data transfer by taking off-peak rates into account during computation of transmission schedule
- Automatically constructs the app-level flow model of the Internet from a minimal set of user supplied parameter values
- Big Data Transfer
- Internet Flow Control
- Evaluation Models
Stage of Development
Lead Innovator: Elizabeth Varki, PH.D
Dr. Varki is an associate professor in the Department of Computer Science at the University of New Hampshire with expertise in performance evaluation. She received her Ph.D. from Vanderbilt University in 1997 and received the Faculty Early Career Development (CAREER) Award from the National Science Foundation in 2001.
- December 2018
GPSonflow: Geographic Positioning of Storage for Optimal Nice Flow
RAIDX: RAID Without Striping
Improve Prefetch Performance by Splitting the Cache Replacement Queue
- 23 September 2010
Sequential Prefetch Cache Sizing for Maximal Hit Rate
- May 2008
Co-allocation in Data Grids: A Global, Multi-user Perspective
- 04 May 2004
Issues and challenges in the performance analysis of real disk arrays
- Nov 2001
Response time analysis of parallel computer and storage systems