Hi there, I'd like to ask whether you can handle my homework which is described below. Furthermore, would like to get a quote, what would be the price? I remain available, should any questions arise....

Hi there,


I'd like to ask whether you can handle my homework which is described below. Furthermore, would like to get a quote, what would be the price? I remain available, should any questions arise.


The task


Running a nutch(regardless of the version eg 1 or 2) exhaustive (not specified in a seed.txt) crawler on .lu domain (only 2 levels down eggoogle.lu- level 1 andgoogle.lu/contact,google.lu/imagesetc level 2) websites only in a VirtualBox machine (Ubuntu 64 bit, with over the default 10 GB hard disk as it was proven too small, preferably 15GB or 20GB, ram of at least 4GB). The aforementioned VirtualBox will be the delivery that I need to obtain, so that I can import it and run on my windows machine. Furthermore the crawler needs to be configured to run in parallel making use of a multi core machine and optimized well, complying with politeness crawling (no more than 2,3 crawled websites per minute). Also nutch needs to be integrated with Solr for indexing.


Deadline -July 21


Delivery - VirtualBox that I can import and run from my computer


Regards
Jul 08, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here