I (Martin J. Osborne) initiated this project in 2006 when I was the first Managing Editor of the (Open Access) journal Theoretical Economics. (I was a member of the independent group that founded the journal, which was later taken over by the Econometric Society.) The purpose of the project was to convert into BibTeX the plain text references in accepted articles so that they could easily be consistently formatted.

With funding from the Student Experience Program of Project Open Source | Open Access at the University of Toronto, Fabian Qifei Bai created the first version of the conversion script in the Spring of 2007.

When Bai graduated from the University of Toronto later in 2007, I took over the coding and wrote a front end for public use, using the Open Journals System of the Public Knowledge Project as a framework. I have continued to develop both the conversion engine and the front end since then.

Starting in the summer of 2023, I reimplemented the system using the Laravel framework, at the same time making many improvements in the conversion engine. The new version was released on 2024.3.15.

The source code is available on Github: https://github.com/osbornemj/text2bib.

The converter consists of a large number of hand-coded rules for extracting the author, title, and publication information from character strings that represent references. I make improvements to it by occasionally looking for errors in the conversions for files uploaded by users. When I see an error, I add the source and the correct version of the BibTeX entry to a database table of examples and modify the code to deal with it correctly, while still correctly converting all the other examples. The examples table currently contains 1105 items. (Unfortunately error reports by users are few and far between, and almost no user responds to clarifactory questions, so the improvement of the algorithm proceeds much more slowly that it could.)

An alternative approach would be to use a machine learning algorithm on some training data. One difficulty with that approach is that the training data would have to consist of a large number of examples — perhaps tens of thousands of them? I don't know where such data could be obtained. One possible source would be the conversions marked as correct on this site, after verification by a human who understands BibTeX. However, the number of conversions that are rated as either correct or incorrect is very small, and it would take decades to accumulate a sufficient number. (Google Scholar produces BibTeX entries for every item it covers, but (a) those entries contain many errors and (b) I don't know how they could be obtained.) Suggestions are welcome.