Unraveling the BitTorrent Ecosystem Unraveling the BitTorrent Ecosystem Chao Zhang, Student Member, IEEE, Prithula Dhungel, Student Member, IEEE, Di Wu, Member, IEEE, and Keith W. Ross, Fellow, IEEE...

1 answer below »
Summarize in less than one page (font 11pt) the paper attached. Be concise but precise.


Unraveling the BitTorrent Ecosystem Unraveling the BitTorrent Ecosystem Chao Zhang, Student Member, IEEE, Prithula Dhungel, Student Member, IEEE, Di Wu, Member, IEEE, and Keith W. Ross, Fellow, IEEE Abstract—BitTorrent is the most successful open Internet application for content distribution. Despite its importance, both in terms of its footprint in the Internet and the influence it has on emerging P2P applications, the BitTorrent Ecosystem is only partially understood. We seek to provide a nearly complete picture of the entire public BitTorrent Ecosystem. To this end, we crawl five of the most popular torrent-discovery sites over a ine-month period, identifying all of 4.6 million and 38,996 trackers that the sites reference. We also develop a high-performance tracker crawler, and over a narrow window of 12 hours, crawl essentially all of the public Ecosystem’s trackers, obtaining peer lists for all referenced torrents. Complementing the torrent-discovery site and tracker crawling, we further crawl Azureus and Mainline DHTs for a random sample of torrents. Our resulting measurement data are more than an order of magnitude larger (in terms of number of torrents, trackers, or peers) than any earlier study. Using this extensive data set, we study in-depth the Ecosystem’s torrent-discovery, tracker, peer, user behavior, and content landscapes. For peer statistics, the analysis is based on one typical snapshot obtained over 12 hours. We further analyze the fragility of the Ecosystem upon the removal of its most important tracker service. Index Terms—BitTorrent Ecosystem, peer-to-peer, content distribution, measurement. Ç 1 INTRODUCTION BITTORRENT is a remarkably popular file distributiontechnology, with millions of users sharing content in hundreds of thousands of torrents on a daily basis. Even in the era of YouTube, BitTorrent traffic continues to grow at impressive rates. For example, downloads of .torrent files from Mininova’s site doubled in 2008, to nearly 7 million downloads in a year [1]. BitTorrent has proven to be particularly effective at distributing large files, including open source software distributions. Fundamental to BitTorrent’s success is its openness—the BitTorrent protocol has been published, and the source code of the baseline implementation has been made widely available. This openness has enabled developers to create over 50 independent BitTorrent client implementations [2], dozens of independent tracker implementations [3], and a multitude of torrent-discovery sites. The openness of the protocol has fostered productive discussions in both the online developer and the research communities, leading to further design improvements. All of this flourishing BitTorrent technology taken together forms the BitTorrent Ecosystem, consisting of millions of BitTorrent peers, hundreds of active trackers, and dozens of torrent-discovery sites (see Fig. 1). BitTorrent is not only a thriving file distribution system, but also serves as a model for many successful live and on- demand P2P video deployments. About a dozen companies in China today (including Coolstreaming, PPLive, and ppstream) use the P2P paradigm to distribute live Chinese television channels, as well as live international content, to Internet users throughout the world. Most of these deploy- ments are very similar to BitTorrent, with peers informing each other of the pieces they have, and the peers then downloading from each other their missing pieces [4]. There is also a multitude of companies today that are deploying P2P Video-on-Demand (VoD) [5]. To create these P2P VoD and live video systems, designers essentially began with the BitTorrent architecture, removed the tit-for-tat, and mod- ified the scheduling algorithm to give priority to blocks that are to be played in the near future. Many communities—including P2P researchers and designers, ISP researchers, copyright holders, and pedophi- lia and terrorist law enforcement agencies—would like to have a comprehensive and in-depth understanding of the Ecosystem in its entirety, as well as tools and methodologies for mapping the Ecosystem in the future. Despite its importance, both in terms of its footprint in the Internet and the influence it has on emerging P2P applications, the BitTorrent Ecosystem is only partially understood today. Although there are a few studies that provide limited insights, there woefully lacks an up-to-date and comprehen- sive picture of the Ecosystem. However, because BitTorrent is an ecosystem involving hundreds of independently operated trackers and torrent-discovery sites (public and private), as well as millions of concurrently active peers (using many different client imple- mentations), it is a major challenge to provide a complete snapshot that spans the entire Ecosystem. No single torrent-discovery website, tracker or ISP can provide the complete picture on its own. In this paper, we aim at providing a comprehensive and up-to-date picture of the BitTorrent Ecosystem. To keep our project manageable, we limit our attention to the public 1164 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 7, JULY 2011 . C. Zhang, P. Dhungel, and K.W. Ross are with the Department of Computer Science and Engineering, Polytechnic Institute of NYU, Six MetroTech Center, Brooklyn, NY 11201. E-mail: [email protected], [email protected], [email protected]. . D. Wu is with the Department of Computer Science, Sun Yat-Sen University, No. 132, Waihuan East Rd., Guangzhou Higher Education Mega Center, Guangzhou 510006, China. E-mail: [email protected]. Manuscript received 23 July 2009; revised 22 Dec. 2009; accepted 21 Apr. 2010; published online 1 June 2010. Recommended for acceptance by M. Parashar. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPDS-2009-07-0336. Digital Object Identifier no. 10.1109/TPDS.2010.123. 1045-9219/11/$26.00 � 2011 IEEE Published by the IEEE Computer Society English-language BitTorrent ecosystem. We do this by crawling five of the most popular discovery sites over a nine-month period, and identify all of the torrents and trackers that the sites reference. Then, over a narrow window of 12-hours, we crawl all the trackers referenced by the discovery sites, obtaining peer lists for all referenced torrents. Our measurement data are more than an order of magnitude larger (in terms of number of torrents, trackers, or peers) than any earlier study [6], [7], [8]. The collected data have been anonymized and are publicly available to the research community.1 Using this extensive data set, we undertake an in-depth analysis, leading to a nearly complete picture of the entire public Ecosystem. The contributions and some of the findings of this paper are as follows: . Based on an asynchronous I/O design, we develop a high-performance multitracker crawler that simulta- neously crawls thousands of trackers with concur- rent TCP connections. The asynchronous I/O design provides a significant performance improvement over multithread designs, allowing us to obtain a snapshot of the Ecosystem’s millions of torrents in about 12 hours. We also develop a discovery-site crawler and adapt it to crawl five major torrent- discovery sites (Mininova, Pirate Bay, BTmonster, Torrent Reactor, and Torrent Portal). . Using the .torrent files and metadata webpages from the five sites, we study the Ecosystem’s torrent- discovery landscape. We find that these five sites collectively index 4.6 million unique torrents, but approximately only 1.2 million of them are active. We investigate the degree of indexing overlap among the sites, the characteristics and motivations of the users who upload .torrent files, and how the sites acquire .torrent files. We find that none of the major torrent sites on its own provides a complete picture of the Ecosystem. . We study the Ecosystem’s tracker landscape. We identify almost 39,000 trackers, although only 728 (less than 2 percent) of these trackers are active. We determine the number of torrents and peers tracked by each of the active trackers. We find that the Top 20 tracker organizations are hosted in many con- tinents, but with a high concentration in northern Europe. We find, for example, that 40 percent of the trackers track no more than four active torrents, and that only 190 trackers track more than 1,000 peers. Pirate Bay, operating the largest trackers, plays a disproportionate role in the Ecosystem. . We study the Ecosystem’s peer landscape. Our analysis is based on one typical snapshot obtained within 12 hours. We find that the Ecosystem is rich in long-tail content, with many torrents being very small (82 percent have no more than 10 peers). Although the Ecosystem is dominated by mice, there are also elephants, with the largest torrents having more than 10,000 simultaneous peers. We investigate the number of torrents a peer joins simultaneously and the geographical distributions of the peers. We determine the countries in which BitTorrent has the highest usage per Internet user. We investigate the distribution of client types being used today, and determine that more than 50 percent the peers today use uTorrent. We also study which clients are being used to create .torrent files and initialize torrents. . We study the Ecosystem’s content landscape. By classifying each of the active torrents into one of 10 categories (movies, music, TV shows, pornography, and so on), we determine which content types are most popular in BitTorrent today. We also perform a geographical analysis, determining in which coun- tries movies, music, and pornography are most ZHANG ET AL.: UNRAVELING THE BITTORRENT ECOSYSTEM 1165 Fig. 1. The BitTorrent Ecosystem (Note that Azureus is now called Vuze). 1. Anonymized data available at http://cis.poly.edu/chao/bt- ecosys.html. popular per Internet user. We analyze the size of the content files being distributed in the Ecosystem for each of the categories. . Although we find that the Ecosystem is, in general, highly diverse, it also contains a major pillar, namely, Pirate Bay’s tracking service. We find that Pirate Bay currently tracks 90 percent of the torrents in the Ecosystem. We undertake an analysis of the Ecosystem’s fragility upon the removal of Pirate Bay, considering whether current DHT and PEX decen- tralized tracking services can pick up the slack. Perhaps the most important message from this paper is a vivid and complete picture of the most successful open Internet application in the current decade. This paper is organized as follows: Section 2 provides an overview of the BitTorrent Ecosystem. Section 3 describes the measurement methodology and scope. Section 4 provides the measurement results for the torrent-discovery, tracker, and peer landscapes. Section 5 provides content and geography classification for the torrents. The importance of Pirate Bay to the Ecosystem is analyzed in Section 6. Section 7 describes related work, and we conclude in Section 8. 2 THE BITTORRENT ECOSYSTEM As shown in Fig. 1, the BitTorrent ecosystem consists of three major components: peers, peer discovery mechanisms, and torrent-discovery sites. The collection of peers that participate in the distribution of a specific file at given time is called a torrent. Each torrent is identified
Answered 1 days AfterFeb 09, 2021

Answer To: Unraveling the BitTorrent Ecosystem Unraveling the BitTorrent Ecosystem Chao Zhang, Student Member,...

Deepti answered on Feb 11 2021
125 Votes
Summary
The paper focuses on demonstrating the complete view of public English language BitTorrent
Ecosystem. It explains BitTorrent Ecosystem in detail using a measurement infrastructure of crawlers and storage system. Each of the three components of BitTorrent namely peer, peer discovery mechanisms and torrent discovery sites are described. This infrastructure is used to estimate number of peers within a torrent. The challenges of developing such an infrastructure are enlisted. The analysis is done on five popular torrent-discovery sites using multi-crawler...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here