| Undertaken by: | Ronan Crowley |
| and | Paul Connolly |
| Supervised by: | John McKenna |
Week: Monday 16th to Saturday 21st of April
In the first week we divided up the Project workload.
Ronan -> HTK Speaker Models
Paul -> Java Networked Client and Server
However, after the first few days, the HTK models prooved exceedingly difficult,
with an ambiguous Users Manual and no basic help for first time users. Therefore,
we both concentrated on the HTK models and the Tutorial provided on Chapter 3 of
the manual. It quickly became obvious that our inital plan for using HTK to build
up speaker models quickly and efficiently was too optimistic and this stage would
take considerable longer than originally thought. The remainder of this week was
dedicated to working through the Tutorial and learning more about the intricacies
both (Documented and Undocumented) of HTK.
Monday 23th of April
Following a meeting with Project supervisor, we decided that a Word Model
for each unique speaker would be required.
For these Word Models, we decided on the following Words:
| ONE | TWO | THREE | FOUR | FIVE | SIX | SEVEN | EIGHT | NINE |
| DOE | RAY | ME | FA | SO | LA | TEE |
Tuesday 24th of April
After a little further experimentation with the Tutorial, we began to build up Word
Models, but with little success. Applying the same techniques prooved difficult.
9 hours
Wednesday 25th of April
We continued working on seperate Word Modelling systems. Recognising mord and more errors
and their solutions.
9 hours
Thursday 26th of April
We continued working on Word Models, as far as we could. When I couldn't get any further,
I went back to the Tutorial and started afresh. This time I was able to get through it
relativly quickly, halting on an error during ReEstimation.
9 hours
Friday 27th of April
Again I continued work on the Tutorial, and slowly got over one error, only to be
presented with another serious "hmmdefs" error. But we are close to having good
progress made.
9 hours
Saturday 28th of April
Based on all our previous experience and errors, We restarted the tutorial again
(The tuturial builds a word recogniser for a simple phone dialing system). This
time we positivly knew each error and their respecive correction. We were able
to get the vast majority of it done.
6 hours
Monday 30th of April
Continued work on the Tutorial, found numerous other Bugs with the Manual.
Issues with the non-re-estimation of the sil and sp models were resolved, by
ensuring that monophones1 listed both phones, (again missing from the Manual).
10 hours
Tuesday 1st of May
Worked out why the file "aligned.mlf" did not contain transcriptions for
each record sample. So we recorded more samples and increased the occurance
of each monophone.
Tested the Tutorial, to find that it was recognising between 80% and 100%
of words, however insertions are numerous.
Began work on a Word model an the same manner as this almost completed Tutorial.
12 hours
Wednesday 2nd of May
We continued work on the definitive Word Model, and got it to the same stage as the
Tutorial.
We also began gathering and analysing all the information available that refers to
HTK and Speaker Verification. Looking into a plan of action once we have Word Models
built.
18:00 Completed Word Model and when queried with Model Voice (Paul's), Results were
either 83% or 100%. When queried with Ronan's voice: Results were 28%. (Interestingly when Ronan
faked a gay voice, results reached 57%).
We now have a Speaker Word Model built successfully !
Further investigation into Speaker Verification, and the possibility of reducing the number of
training utterances (we have used 17).
10 hours
Thursday 3rd of May
Researched into Java Real-time recording of Audio. Found Java Media Framework. Experimented with
its use.
Wrote a Random Word Generator that will be used to prompt User for Test cases. This had an added
stipulation that no word would appear twice in a row, thus making it easier for HTK to recognise
different words.
9 hours
# Coding parameters SOURCEKIND = WAVEFORM SOURCEFORMAT = WAV TARGETKIND = MFCC_0_D_A TARGETRATE = 100000.0 SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALIZE = T ZMEANSOURCE = T11 hours
Saturday 5th of May
Began work on the "Add new User" portion of the UI.
Began work on transmission of recorded audio files from Client to server.
Continued research into Java WAV Recording.
4 hours
Tuesday 8th of May
Completed "Add new User" portion of UI.
Completed file transmission portion of Program
Continued research into Java WAV recording.
Ronan got REALLY sunburnt play football at lunchtime !
9 hours
Wednesday 9th of May
For a new user to be added to the system, a word model need to be
automatically built up. To do this, a collection of directories must be
created into which various different files must be copied. These files
range from perl scripts, batch files and configuration files that will
allow for the automation of Word Model building. The client sends the
required 17 WAV files to the server, which are in turn plugged into the
correct directory and used for Word Model building.
Today was spend arranging specific files in such a way as to enable
automatic user creation.
Completed automatic vFloors to macros conversion.
See Here (Right click to download)
Work was also done on minor system details, such as automatic MLF file
creation, based on the contents of the Test file.
12 hours
Thursday 10th of May
Completed perl script for Automatic sil to sp conversion.
See Here (Right click to download)
Had a meeting with out Project Supervisor, to discuss possible better
ways of verifying speakers. We began investigation into maximum liklihood
estimates.
9 hours
Friday 11th of May
Completed additional verification testing possibilities, but the results
were not as good as expected.
Further investigated log-liklihoods.
Began system integration, writing Client and Server code, which includes
pre-written functions.
12 hours
Saturday 12th of May
Continued Client and Server design and implementation. Integrated Test
text generation, window pop-ups, WAV recording and automatic file
transmission.
Encountered problems with % accuracy.
4 hours
Monday 14th of May
Continued work on the integration of the system.
Had discussion with Supervisor, decided to investigate Speaker Vs
Anti-Speaker Model systems.
12 hours
Tuesday 15th of May
Continued work on the integration of the system.
Multiple Perl Scripts were written to control the creation of new users:
1) Make correct mappings (e.g. users/ronan/S0001.wav users/ronan/S0001.mfc)
2) Fix labels (e.g. "S0001.lab" to "users/ronan/S0001.lab")
3) Modify existing Perl files.
12 hours
Wednesday 16th of May
Continued work on the integration of the system.
Gathered necessary information for Anti-Speaker Model Building.
Will get people to record tomorrow.
12 hours
Thursday 17th of May
Began recording the 10 Speakers whose Data will be combined to create the
anti-speaker model. This is proving quite a lenghty process, as each
person want to talk for about 20 minutes after recording information !
12 hours
Friday 18th of May
Finished recording of data.
Began work on analysis and threshold calculation
A clearcut threshold of acceptance (c. 500) bgan to appear.
12 hours
Saturday 19th of May
Corrected minor GUI and functionality issues.
Continued with threshold investigation.
Designed an interface for the web documentation, and began
documentation.
3 hours
Monday 21st of May
Recorded much more data for testing and training purposes.
Previous trends that indicated a clearcut threshold, were no longer
apparent.
Worked on extra functionality and presentation.
Continued Documentation.
12 hours