Upgrade system: "Crawling Process: The real estate case"
Date
2011Author
Version
Acceso abierto / Sarbide irekia
Type
Proyecto Fin de Carrera / Ikasketen Amaierako Proiektua
Impact
|
nodoi-noplumx
|
Abstract
Nowadays everybody uses or has used the web for different reasons. Everybody is aware that the web constitutes an architecture to access information and retrieves data in the form of interconnected documents which are distributed in millions of machines through the internet.
The most commonly used protocol for the retrieval of such documents is the http (Hypertext Transfer Protocol). When a use ...
[++]
Nowadays everybody uses or has used the web for different reasons. Everybody is aware that the web constitutes an architecture to access information and retrieves data in the form of interconnected documents which are distributed in millions of machines through the internet.
The most commonly used protocol for the retrieval of such documents is the http (Hypertext Transfer Protocol). When a user “demands” to retrieve some document or some information in the web, the use of this protocol is enough to do so. In this way the user can move through websites retrieving each piece of information or document which are useful to them at any given time. The question raised here is what happens in the case that one demands to retrieve millions and billions of documents or to retrieve a large volume of information either for future processing or for a simple reading. With the constant increase of the volume of data in the web as well as the daily renewal of the contents of the various websites, it is understandable that it is impossible for such a vast volume of data to be collected by the user, and therefore it is imperative the need to create mechanisms to automate this data retrieval procedure.
This is exactly the purpose of the present project: the design and implementation of a complete data collection system (web crawler), which is applicable in the field of real estate in Greece.
More specifically, the system implemented concerns the data (advertisements) retrieval by the five most popular property sites. The ultimate goal of this implementation is the collection of the advertisements from the above mentioned websites, so as to extract conclusions and statistical data for the overall picture of the property market in Greece. In addition to the above system and with the purpose to meet its demands, a database to store the retrieved data by the crawling process of the advertisements was designed and implemented. [--]
Subject
Recuperación de la información,
Data collection systems,
Rastreadores,
Information retrieval,
Data collection systems,
Web crawlers
Departament
Universidad Pública de Navarra. Departamento de Ingeniería Matemática e Informática /
Nafarroako Unibertsitate Publikoa. Matematika eta Informatika Ingeniaritza Saila
Degree
Ingeniería Técnica en Informática de Gestión /
Kudeaketa Informatikako Ingeniaritza Teknikoa