I am a 4th-year Ph.D. student in Computer Science at
College of William and Mary under the supervision of Prof.
Xipeng Shen.
I received my B.Sc. degree in Math and M.Eng. degree in Computer Science
from Harbin Institute of Technology,
China in 2007 and 2009 respectively.
I work on various program optimizations, especially on program parallelization
and program behaviors prediction.
Applications based on Deterministic Finite Automata(DFA) are important
for lexing in web browsers, routing in networks, decoding in cryptography,
compression, and so on. The efficiency of these applications are often
critical. But these applications are especially hard to parallelize due
to strong dependences among states. The problem forms a major obstacle
for them to benefit from the rapidly growing processor-level parallelism.
Recently, researchers start to explore the use of speculative execution
for DFA-based applications and have shown some promising results. In this
work, we go one step further to explore the inherent relations between
the design of speculation schemes and the properties of DFA-based applications.
We reveal some theoretical findings in the relations, and develop a probabilistic
models-based approach to maximizing speculation benefits for DFA-based
applications. Experiments confirm the uncovered properties and demonstrate
the promise of the developed techniques.
HTML parser is a basic component in modern web browsers. It takes a
byte stream as input and outputs a Document Object Model(DOM). On one hand,
the industry has began to design and implement parallel web browsers to take advantage
of the increasing hardware parallelism (e.g., multicore processors).On the other
hand, though the HTML has been widely used for decades, its parsing remains
sequential. The "embarassingly sequential" parser prevents the
parallelism that parallel web browsers can fully explore.
In this work, we target HTML5, the latest HTML specified by W3C and WHATWG.
Its standard parsing algorithm has been supported by major web broswers, such as
Chrome, Firefox and Safari. Our goal is to design and develop both pipelined and
data parallel HTML5 parsers by addressing various complexities in
HTML documents, such as lackness of formal grammar, automatic error correcting, embedded languages
(e.g., Javascript and CSS), dynamic inputs(via document.write()), and strong dependences.
Program call sequence prediction
Program call sequence is a sequence of functions/methods that are invoked during
program execution. It reflects one critical program behavior. Enabling program call sequence prediction
has some immediately benefits, such as methods prefetching and call sequence-aware JIT optimization.
In this project, we introduce a new program representation, based on which a prediction model
is built to take advantage of both static program source and dyamic execution statistics
so that the program call sequence can be predicted at runtime.
"Remember, the brick walls are there for a reason. The brick walls are
not there to keep us out. The brick walls are there to give us a chance
to show how badly we want someting. Because the brick walls are there to
stop the people who don't want it badly enough. They're there to stop the
other people."
- Randy Pausch