Indexing Hebrew texts for later retrieval is not a trivial task. Although several solutions exist, they don't necessarily provide the best results in terms of relevancy. Either way, there is no freely available solution allowing to index Hebrew even at the very basic level.
HebMorph was started with this in mind. It is a free, open-source effort for making Hebrew properly searchable by various IR software libraries, while maintaining decent recall, precision and relevancy in retrievals. During the work on this project, we will try and come up with different approaches to indexing Hebrew, and provide the tools to perform reliable comparisons between them. This project's ultimate goal is providing various IR libraries with the best Hebrew IR capabilities possible.
Apache Lucene has been selected to be our planning and testing framework. This is thanks to its advanced capabilities, flexibility, and the author's familiarity with it. During these initial steps, .NET code is being written and used with Lucene.Net (a .Net port of Java Lucene). Once the project stabilizes enough, ports to other languages will be followed.
More detailed information on why this project is important can be found in a series of 3 blog posts: Challenges with indexing Hebrew texts (HebMorph, part 1), Finding Hebrew lemmas (HebMorph, part 2) and Open-source Hebrew information retrieval (HebMorph, part 3). The project's roadmap is in the last part.
The new HebMorph home, which is still being populated with content: http://hebmorph.code972.com
- Code repository: http://github.com/synhershko/HebMorph
- Think-tank mailing list for discussion and planning: https://lists.sourceforge.net/lists/listinfo/hebmorph-thinktank
- For latest updates see posts in this blog tagged HebMorph
Some parts of HebMorph are powered by hspell, copyright (C) 2000-2013, Nadav Har'El and Dan Kenigsberg (http://hspell.ivrix.org.il/).
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License v3 as published by the Free Software Foundation.
This program is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See theGNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public Licensealong with this program. If not, see <http://www.gnu.org/licenses/>.
Note that not only the programs in the distribution, but also the dictionary files and the generated word lists, are licensed under the AGPLv3. There is no warranty of any kind for the contents of this distribution.
The hspell dictionary files distributed with HebMorph are provided with the license to be used ONLY for search by HebMorph. To get an official hspell distribution under the GPLv2 license, visit their site.
If you are interested in using this product commercially or without releasing your source-code as AGPLv3, please contact the author.