Wednesday, December 24, 2008

ORACLE TEXT: Document Classification

The function of a classification application is to perform some action based on document content. These actions can include assigning a category id to a document or sending the document to a user. The result is classification of a document. Documents are classified according to pre-defined rules. These rules select for a category. For instance, a query rule of 'presidential elections' might select documents for a category about politics. Oracle Text provides several types of classification. One type is simple , or rule-based classification, discussed here, in which you create both document categories and the rules for categorizing documents. With supervised classification , Oracle Text derives the rules from a set of training documents you provide. With clustering , Oracle Text does all the work for you, deriving both rules and categories.

Create the Rule Table


CREATE TABLE queries (
query_id NUMBER,
query_string VARCHAR2(80)
);

INSERT INTO queries VALUES (1, 'oracle'); INSERT INTO queries VALUES (2, 'larry or ellison');
INSERT INTO queries VALUES (3, 'oracle and text'); INSERT INTO queries VALUES (4, 'market share');

Create Your CTXRULE Index


CREATE INDEX queryx ON queries(query_string) INDEXTYPE IS CTXRULE;

Classify with MATCHES


COLUMN query_string FORMAT a35;
SELECT query_id,query_string FROM queries WHERE MATCHES (query_string, 'Oracle announced that its market share in databases increased over the last year.')>0;

Related Topics:

  1. Example of Text Queries on Document Collections.
  2. Example of Queries on Catalog Information.

No comments: