String htmlLine = "What We Know Aboutâ¦Health Literacy? [PDF-2.54MB]
"; htmlLine += "What We Know Aboutâ¦Health Literacy? [PDF-2.54MB]
"; htmlLine += "What We Know Aboutâ¦Health Literacy? [PDF-2.54MB]
"; htmlLine += "What We Know Aboutâ¦Health Literacy? [PDF-2.54MB]
"; String expr ="]*)>" // Element Detail - Group1 + "([^<]+)" // Element Data - Group2 + ""; Pattern patt = Pattern.compile(expr, Pattern.DOTALL | Pattern.UNIX_LINES); Matcher m = patt.matcher(htmlLine); Listanchors =new ArrayList (); while(m.find()) { anchors.add(""+m.group(2)+""); } //get external Links String hrefExpr = "href=\"" + "([^\"]+)" + "\"[^>]*>" + "([^<]+)"; patt = Pattern.compile(hrefExpr, Pattern.DOTALL | Pattern.UNIX_LINES); for(String anchor : anchors){ if(anchor.contains("href") || anchor.contains("HREF")){ m = patt.matcher(anchor); while(m.find()) { String href = m.group(1); if(!href.contains(".gov") && !href.contains("localhost")){ System.out.println(anchor); } } } }
August 28, 2012
[Core Java] Using Regular Expression to extract anchor from html
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment