i have weird issue i'm @ wits end about. maybe fresh set of eyes can oint out problem!
i'm using jsoup parse html filethe problem set of tables being outputted file 3-4 times, when being written fresh new file. first time outputted 1 straight line across .csv file every other time formatted want be. want right first time , have there olny first time!
my code:
document doc = new document(file.tostring()); doc = jsoup.parse(file, null); elements tables = doc.select("table"); (element table: tables) { elements rows = table.select("tr"); (element row: rows) { elements cells = row.getelementsbytag("td"); stringbuffer values = new stringbuffer(); (element cell: cells) { string celltext = cell.text(); celltext = celltext.replaceall(",", ""); celltext = celltext.replaceall("£", ",£"); celltext = celltext.replaceall(",£", "£"); system.out.println(celltext); values.append(celltext + ","); } system.out.println(values.tostring()); addtofile(values + ","); } } // add new data mysnmpresults file private static void addtofile(string mystring) { // add newest entry .csv // file try { bufferedwriter out = new bufferedwriter(new filewriter( "myparseddomtree.csv", true)); out.write(mystring + "\n"); out.close(); } catch (ioexception e) { e.printstacktrace(); } }
it case of complex html file, various tables nested in each other, don't see how causes tables number data appears once output 3 times...
edit
fragment of html:
<tr bgcolor = "#eeeeee" height = 20 > <td width = 15% > <font face="tahoma" size="1"> dept '<b>food incl vat</b>' </td> <td width = 10% align = right><font face="tahoma" size="1"> £688.95 </td> <td width = 10% align = right><font face="tahoma" size="1"> £642.60 </td> <td width = 10% align = right><font face="tahoma" size="1"> £767.95 </td> <td width = 10% align = right><font face="tahoma" size="1"> £3,007.00 </td> <td width = 10% align = right><font face="tahoma" size="1"> £1,525.60 </td> <td width = 10% align = right><font face="tahoma" size="1"> £1,970.40 </td> <td width = 10% align = right><font face="tahoma" size="1"> £353.00 </td> <td width = 1%></td><td width = 14% align = right bgcolor = "#dfdfdf"><font face="tahoma" size="1" color = '#444444'> <b>£8,955.50</b></td> </tr>
edit: sorry had error in code. fixed now.
i don't have enough of code make solid guess, i'm not sure why trying size of table , go through table many times .size() gets (i'm guess 3-4). going want find root of tables, under roots name of table (the class name of tables should same), search each table whatever want find. maybe code :)
html:
<ul class="listoftables"> <li class="table"> <span class="item"> <li class="table"> <span class="item"> <li class="table"> <span class="item"> <li class="table"> <span class="item">
java code:
public void searchforitems(document doc) { elements tables = doc.select("li[class=table]"); (element table : tables) { string item; elements itemsintable = table.select("span[class=item]"); item = itemsinttable.text(); //write item file. depending on in table, might //have write more complex scan. looking things attributes } }
Comments
Post a Comment