The method for detecting the negative terms in Chinese electronic medical record (EMR) is useful in providing evidence for constructing concept index. In this respect, we adopted an improved method which combined maximum matching with mutual information in order to extract terms in EMRs. This method can overcome the influence of overlay ambiguity. In addition, for the determination of negative semantic, we also adopted an improved method which combined rule-based method with word co-occurrence. This new method can reduce the probability of appearance of false positive terms caused by punctuation input errors. The result showed that the negative predictive value is 7.85% higher than the rule-based method.
Clinic expert information provides important references for residents in need of hospital care. Usually, such information is hidden in the deep web and cannot be directly indexed by search engines. To extract clinic expert information from the deep web, the first challenge is to make a judgment on forms. This paper proposes a novel method based on a domain model, which is a tree structure constructed by the attributes of search interfaces. With this model, search interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from the returned web pages indexed by search interfaces. To filter the noise information on a web page, a block importance model is proposed. The experiment results indicated that the domain model yielded a precision 10.83% higher than that of the rule-based method, whereas the block importance model yielded an F1 measure 10.5% higher than that of the XPath method.