TBDClust: time-based density clustering to enable free browsing of sites in pay-per-use mobile Internet providers

Torres García, Luis MiguelMagaña Lizarrondo, EduardoMorató Osés, DanielGarcía-Jiménez, SantiagoIzal Azcárate, Mikel2019-02-212019-02-2120171084-804510.1016/j.jnca.2017.10.007https://academica-e.unavarra.es/handle/2454/32351The World Wide Web has evolved rapidly, incorporating new content types and becoming more dynamic. The contents from a website can be distributed between several servers, and as a consequence, web traffic has become increasingly complex. From a network traffic perspective, it can be difficult to ascertain which websites are being visited by a user, let alone which part of the user's traffic each website is responsible for. In this paper we present a method for identifying the TCP connections involved in the same full webpage download without the need of deep packet inspection. This identification is needed for example to enable free browsing of specific websites in a pay per use mobile Internet access. It could be not only for third party promoted websites but also portals to gubernamental or medical emergency websites. The proposal is based on a modification of the DBSCAN clustering algorithm to work online and over one-dimensional sorted data. In order to validate our results we use both real traffic and packet captures from a controlled environment. The proposal achieves excellent results in consistency (99%) and completeness (92%), meaning that its error margin identifying the webpage downloads is minimal.11 p.application/pdfeng© 2017 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY licenseClustering TCP connectionsTime-based density clusteringDBSCANMobile web browsingOnline monitoringReal traffic datasetTBDClust: time-based density clustering to enable free browsing of sites in pay-per-use mobile Internet providersinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/openAccess