Search Engine project

Kevin Fenzi kevin at scrye.com
Wed Feb 15 20:25:15 UTC 2012


...snip... 

I'm personally happy moving ahead with Datapark. The rest of them fail
for various (sometimes multiple) reasons. ;) 

If it turns out that it doesn't work too well or otherwise sucks we can
just retire it. I think it's well worth persuing for now. 

Once we have a package, it should be easy to setup a instance and get a
full crawl done and see how well it works out. 

Steps I see: 

- Package it and get that reviewed (I'm happy to review). 

- Setup test instance

- Identify all the resources we want it to crawl and crawl them. 
(will need to adjust threads and such here, also may need to adjust
robots.txt to allow our crawler to crawl more). Ideally after a full
crawl, it can do checks pretty quickly. 

- Adjust results 
	* May need to look at tagging pages or resources so they are
	  better described. 
	* May need to fix it so csrf tokens aren't saved in results. 
	* May need to teach it what LANG some things are and favor
	  things from your current LANG.
	* May need to drop some results/sites out.  

- Theme search page (Sounds like there's a good start/possibly done
  version already). 

- Change search fields/add them
	* Change the wiki to call this. 
	* possibly add search field to all apps?

- Profit

kevin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20120215/4fd868db/attachment.sig>


More information about the infrastructure mailing list