i want parse domain name in java. example,
http://facebook.com/bartsf http://www.facebook.com/pages/shine-communications/169790283042195 http://graph.facebook.com/100002306245454/picture?width=150&height=150 http://maps.google.com/maps?hl=en&q=37.78353+-122.39579 http://www.google.com/url?sa=x&q=http://www.onlinehaendler-news.de/interviews/1303-abba24-im-spagat-zwischen-haendler-und-kaeuferinteressen.html&ct=ga&cad=caeqargaiaaoataboafanqsqjwviavaawabiamrl&cd=xa_chwhng70&usg=afqjcnfmgnkzqn0fnkmfkz1ntkk1n9gg9a
here code writing map reduce code.
string[] whitelist={"www.facebook.com","www.google.com"}; urlvalidator urlvalidator=new urlvalidator(schemes); readfile line line line in file { string scurrentline=line; if(scurrentline.length()>=3) { string tempstring=scurrentline.substring(0,3); if(!tempstring.equals("192") && !tempstring.equals("172") && !tempstring.equals("10.")) { scurrentline="http://"+scurrentline; if(urlvalidator.isvalid(scurrentline))//domain filter should here { system.out.println(scurrentline); } } tempstring=""; } }
i want filter if domain name either facebook.com or google.com urls above filtered out.
use java.net.uri
parse strings uris. there's no need reinvent wheel here.
uri foo = new uri("http://facebook.com/bartsf"); string host = foo.gethost(); // "facebook.com"
Comments
Post a Comment