Specification of Diqt Version 1

Copyright (C) 2000-2006 Mikio Hirabayashi
Last Update: Sat, 15 Apr 2006 15:55:29 +0900

Table of Contents

  1. Overview
  2. Installation
  3. How to Search
  4. Frequently Asked Questions
  5. Copying

Overview

Diqt is a WWW-based multilingual dictionary reference tool. That is, dictionaries of many languages can be searched using a web browser. Any language is available if you have its dictionary data. For example, you can search English-Japanese, English-German, English-French, and Japanese-English dictionaries at the same time.


Installation

Preparation

Diqt works on most UNIX and its compatible OSs. To install Diqt from a source package, GCC of 2.8 or later version and `make' are required.

When an archive file of Diqt is extracted, change the current working directory to the generated directory and perform installation.

Building

Run the configuration script.

./configure

Build programs.

make

When a series of work finishes, `mkdiqt' and `search.cgi' will be generated.

Installation

Diqt works on a WWW server. Prepare such a WWW server providing CGI as Apache and so on. Read the manual of each WWW server about setting of the environment of CGI.

Copy `search.cgi' into a directory where CGI scripts are available. Configuration files are also placed there.

Making databases

Make some dictionary databases by importing TSV (Tab Separated Values) files. Each line of a TSV file is composed of two fields. The first field is the index. The second field is the descriptive text. For example, an English-English dictionary is the following. The separate character of each line is not a white space but a tab.

pearl   gem in the shells of certain mollusks
Perl    practical extraction and report language
Ruby    object-oriented scripting language
ruby    gem of deep red mineral corundum

The character encoding of a dictionary should not conflict US-ASCII. At least, UTF-8, ISO-8859-JP, and EUC-JP are allowed. If you use dictionaries of two or more languages, it is necessary to unify their encodings. If you use Japanese only, EUC-JP is suggested. If you use Europian prime languages, ISO-8859-1 is suggested. If you use other languages around the world, UTF-8 is suggested.

The command `mkdict' is used to convert a TSV file to a database file of Diqt. The first argument specifies the name of a dictionary file. TSV file should be read via the standard input. For example, to make `English-Japanese' from `eiwa.tsv', and to make `Japanese-English' from `waei.tsv', the following commands are performed.

mkdiqt English-Japanese < eiwa.tsv
mkdiqt Japanese-English < waei.tsv

Makeing configuration files

Make a configuration file named as `search.conf'. As the following, its contents specify the character encoding, the language, the title, and the names of the dictionaries. `list' and `path' are optional.

encoding: UTF-8
lang: ja
title: English-Japanese/Japanese-English Dictionaries
dict: English-Japanese
dict: Japanese-English
auto: English-Japanese,Japanese-English
list: list.txt
path: /bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/home/mikio/bin

If `auto' is specified auto selector of dictionaries is enabled. If the input phrase is composed of ASCII characters only, the first field separated by a comma is selected, else, the second field is selected.

If `list' is specified, approximate matching with `agrep' is enabled. Save candidate words into the specified file. `path' specifies the command search path for `agrep'.

Make a style-sheet file named as `search.css'. This specifies the style of the page. Besides, make a HTML file named as `search.top'. Its contents are included under the input form in the page when no search phrase is specified.

Configuration files and dictionary files should be placed in the same directory of `search.cgi'. Refer to those samples are included in the archive file of Diqt.


How to Search

Access the URL of `search.cgi' using a Web browser.

Input a search phrase into the space of the page. The subsequent select box specifies the referent dictionaries. The next select box specifies the search mode. `Forward Matching' searches for terms beginning with the phrase. `Backward Matching' searches for terms ending with the phrase. `Include Matching' searches for terms including the phrase. `Full Matching' searches for terms completely equal to the phrase. `Full Text Search' scans both of indexes and descriptive texts. The next select box specifies the max number of shown terms.

`Forward Matching' and `Full-Matching' are very fast because the index of the database is used. `Backward Matching' and `Include Matching' are slow because every index are scanned. `Full-text Search' is further slow because every index and every descriptive texts are scanned.


Frequently Asked Questions

Q. : Where can I obtain a dictionary data?
A. : As for English-English dictionaries, WordNet is suggested. As for Japanese-English or English-Japanese dictionaries, EIJIRO is suggested. Gene95 and EDICT are also. Note that each dictionary has its own license. As for EIJIRO, the utility script `eijiro2tsv' is useful for conversion to TSV.
Q. : How can I make a TSV file from my dictionary?
A. : Generally, the command `sed' is useful. If it is not works, you should learn `awk' or `perl'.
Q. : What is the format of the database of Diqt?
A. : Diqt uses the database of B+ tree featured by QDBM. The key of each record in the database is the conversion of the index of each terms. The value of each record is tab separeted values of the index and descriptive text of each terms.
Q. : Can I update my dictionary database?
A. : You can update your database, using the command `vlmgr' of QDBM. But, It is easier to edit the TSV file and to rebuild the database.

Copying

Diqt is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version.

Diqt is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Diqt (See the file `COPYING'); if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.

Diqt was written by Mikio Hirabayashi. You can contact the author by e-mail to <mikio@users.sourceforge.net>. Any suggestion or bug report is welcome to the author.