Parallel DFA processing

Applications based on Deterministic Finite Automata(DFA) are important for lexing in web browsers, routing in networks, decoding in cryptography, compression, and so on. The efficiency of these applications are often critical. But these applications are especially hard to parallelize due to strong dependences among states. The problem forms a major obstacle for them to benefit from the rapidly growing processor-level parallelism.

Recently, researchers start to explore the use of speculative execution for DFA-based applications and have shown some promising results. In this work, we go one step further to explore the inherent relations between the design of speculation schemes and the properties of DFA-based applications. We reveal some theoretical findings in the relations, and develop a probabilistic models-based approach to maximizing speculation benefits for DFA-based applications. Experiments confirm the uncovered properties and demonstrate the promise of the developed techniques.

Parallel HTML5 parsing

HTML parser is a basic component in modern web browsers. It takes a byte stream as input and outputs a Document Object Model(DOM). On one hand, the industry has began to design and implement parallel web browsers to take advantage of the increasing hardware parallelism (e.g., multicore processors).On the other hand, though the HTML has been widely used for decades, its parsing remains sequential. The "embarassingly sequential" parser prevents the parallelism that parallel web browsers can fully explore.

In this work, we target HTML5, the latest HTML specified by W3C and WHATWG. Its standard parsing algorithm has been supported by major web broswers, such as Chrome, Firefox and Safari. Our goal is to design and develop both pipelined and data parallel HTML5 parsers by addressing various complexities in HTML documents, such as lackness of formal grammar, automatic error correcting, embedded languages (e.g., Javascript and CSS), dynamic inputs(via document.write()), and strong dependences.

Program call sequence prediction

Program call sequence is a sequence of functions/methods that are invoked during program execution. It reflects one critical program behavior. Enabling program call sequence prediction has some immediately benefits, such as methods prefetching and call sequence-aware JIT optimization.

In this project, we introduce a new program representation, based on which a prediction model is built to take advantage of both static program source and dyamic execution statistics so that the program call sequence can be predicted at runtime.

"Remember, the brick walls are there for a reason. The brick walls are not there to keep us out. The brick walls are there to give us a chance to show how badly we want someting. Because the brick walls are there to stop the people who don't want it badly enough. They're there to stop the other people."
- Randy Pausch

contacts

Office: Room 101C, McGlothlin-Street Hall

Mailing Address:
Computer Science Department
College of William & Mary
P.O. Box 8795 Williamsburg, VA 23187-8795

Email: zzhao [at] cs.wm.edu

news

Android App
Mar 2013
FavoriteApps finds your favorite apps on the fly [download]

Android App
Mar 2013
SystemMonitor monitors system process/service/tasks [download]

Paper Accepted
Nov 2012
PPoPP13 accepts our paper on GPU memory optimizations

Internship
Jul 2012 - Sep 2012
Research intern @ Mozilla Corp.

Paper Accepted
Jul 2012
OOPSLA12 accepts our paper on loop sequence prediction

Poster Accepted
Jul 2012
PACT12 accepts our poster on parallel DFA processing