Fork me on GitHub

Tutorial 10

E10 Parsing HTML Documents

This tutorial demonstrates the expressiveness of XMLBeam projections. Four web pages about different programming languages will be fetched from Wikipedia and their creator will be extracted. * Shows a projection to HMTL Documents

Projection

@XBDocURL("http://en.wikipedia.org/wiki/{0}_(programming_language)")
public interface ProgrammingLanguage {
 
     
    @XBRead("//b[1]")
    String getName();
 
    @XBRead("normalize-space(//td[../th = \"Designed by\"])")
    String getCreator();
 
}

Example Code

This is the complete code as JUnit testcase. The pages are being loaded and parsed in just one line of code.

public class TestWikiAccess extends TutorialTestCase{
 
    final private String[] PROGRAMMING_LANGUAGES = new String[] { "Java", "C++", "C", "Scala" };
     
    @Test
    public void wikiIt() throws IOException {
        for (String name : PROGRAMMING_LANGUAGES) {
            ProgrammingLanguage lang = new XBProjector().io().fromURLAnnotation(ProgrammingLanguage.class,name);   
            System.out.println(lang.getCreator() + " designed " + lang.getName());
         }
    }
}

Test Output

James Gosling and Sun Microsystems designed Java Bjarne Stroustrup designed C++ Dennis Ritchie designed C Martin Odersky designed Scala