Project Activity Log

Speaker Verification Implemented Security

Undertaken by:	Ronan Crowley
and	Paul Connolly
Supervised by:	John McKenna

Project Activity Log

Week: Monday 16th to Saturday 21st of April
In the first week we divided up the Project workload.
Ronan -> HTK Speaker Models
Paul -> Java Networked Client and Server
However, after the first few days, the HTK models prooved exceedingly difficult, with an ambiguous Users Manual and no basic help for first time users. Therefore, we both concentrated on the HTK models and the Tutorial provided on Chapter 3 of the manual. It quickly became obvious that our inital plan for using HTK to build up speaker models quickly and efficiently was too optimistic and this stage would take considerable longer than originally thought. The remainder of this week was dedicated to working through the Tutorial and learning more about the intricacies both (Documented and Undocumented) of HTK.

Monday 23th of April
Following a meeting with Project supervisor, we decided that a Word Model for each unique speaker would be required.
For these Word Models, we decided on the following Words:

ONE	TWO	THREE	FOUR	FIVE	SIX	SEVEN	EIGHT	NINE
DOE	RAY	ME	FA	SO	LA	TEE

9 hours

Tuesday 24th of April
After a little further experimentation with the Tutorial, we began to build up Word Models, but with little success. Applying the same techniques prooved difficult.
9 hours

Wednesday 25th of April
We continued working on seperate Word Modelling systems. Recognising mord and more errors and their solutions.
9 hours

Thursday 26th of April
We continued working on Word Models, as far as we could. When I couldn't get any further, I went back to the Tutorial and started afresh. This time I was able to get through it relativly quickly, halting on an error during ReEstimation.
9 hours

Friday 27th of April
Again I continued work on the Tutorial, and slowly got over one error, only to be presented with another serious "hmmdefs" error. But we are close to having good progress made.
9 hours

Saturday 28th of April
Based on all our previous experience and errors, We restarted the tutorial again (The tuturial builds a word recogniser for a simple phone dialing system). This time we positivly knew each error and their respecive correction. We were able to get the vast majority of it done.
6 hours

Monday 30th of April
Continued work on the Tutorial, found numerous other Bugs with the Manual.
Issues with the non-re-estimation of the sil and sp models were resolved, by ensuring that monophones1 listed both phones, (again missing from the Manual).
10 hours

Tuesday 1st of May
Worked out why the file "aligned.mlf" did not contain transcriptions for each record sample. So we recorded more samples and increased the occurance of each monophone.
Tested the Tutorial, to find that it was recognising between 80% and 100% of words, however insertions are numerous.
Began work on a Word model an the same manner as this almost completed Tutorial.
12 hours

Wednesday 2nd of May
We continued work on the definitive Word Model, and got it to the same stage as the Tutorial.
We also began gathering and analysing all the information available that refers to HTK and Speaker Verification. Looking into a plan of action once we have Word Models built.
18:00 Completed Word Model and when queried with Model Voice (Paul's), Results were either 83% or 100%. When queried with Ronan's voice: Results were 28%. (Interestingly when Ronan faked a gay voice, results reached 57%).
We now have a Speaker Word Model built successfully !
Further investigation into Speaker Verification, and the possibility of reducing the number of training utterances (we have used 17).
10 hours

Thursday 3rd of May
Researched into Java Real-time recording of Audio. Found Java Media Framework. Experimented with its use.
Wrote a Random Word Generator that will be used to prompt User for Test cases. This had an added stipulation that no word would appear twice in a row, thus making it easier for HTK to recognise different words.
9 hours

Friday 4th of May
Breakthrough! We worked out how to use Pre-recorded WAV files to generate mfcc's. (THis is used un the HCopy tool). To do this we used the config file below:

# Coding parameters
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAV
TARGETKIND = MFCC_0_D_A
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALIZE = T
ZMEANSOURCE = T

11 hours

Saturday 5th of May
Began work on the "Add new User" portion of the UI.
Began work on transmission of recorded audio files from Client to server.
Continued research into Java WAV Recording.
4 hours

Tuesday 8th of May
Completed "Add new User" portion of UI.
Completed file transmission portion of Program
Continued research into Java WAV recording.
Ronan got REALLY sunburnt play football at lunchtime !
9 hours

Wednesday 9th of May
For a new user to be added to the system, a word model need to be automatically built up. To do this, a collection of directories must be created into which various different files must be copied. These files range from perl scripts, batch files and configuration files that will allow for the automation of Word Model building. The client sends the required 17 WAV files to the server, which are in turn plugged into the correct directory and used for Word Model building.
Today was spend arranging specific files in such a way as to enable automatic user creation.
Completed automatic vFloors to macros conversion. See Here (Right click to download)
Work was also done on minor system details, such as automatic MLF file creation, based on the contents of the Test file.
12 hours

Thursday 10th of May
Completed perl script for Automatic sil to sp conversion. See Here (Right click to download)
Had a meeting with out Project Supervisor, to discuss possible better ways of verifying speakers. We began investigation into maximum liklihood estimates.
9 hours

Friday 11th of May
Completed additional verification testing possibilities, but the results were not as good as expected.
Further investigated log-liklihoods.
Began system integration, writing Client and Server code, which includes pre-written functions.
12 hours

Saturday 12th of May
Continued Client and Server design and implementation. Integrated Test text generation, window pop-ups, WAV recording and automatic file transmission.
Encountered problems with % accuracy.
4 hours

Monday 14th of May
Continued work on the integration of the system.
Had discussion with Supervisor, decided to investigate Speaker Vs Anti-Speaker Model systems.
12 hours

Tuesday 15th of May
Continued work on the integration of the system.
Multiple Perl Scripts were written to control the creation of new users:
1) Make correct mappings (e.g. users/ronan/S0001.wav users/ronan/S0001.mfc)
2) Fix labels (e.g. "S0001.lab" to "users/ronan/S0001.lab")
3) Modify existing Perl files.
12 hours

Wednesday 16th of May
Continued work on the integration of the system.
Gathered necessary information for Anti-Speaker Model Building.
Will get people to record tomorrow.
12 hours

Thursday 17th of May
Began recording the 10 Speakers whose Data will be combined to create the anti-speaker model. This is proving quite a lenghty process, as each person want to talk for about 20 minutes after recording information !
12 hours

Friday 18th of May
Finished recording of data.
Began work on analysis and threshold calculation
A clearcut threshold of acceptance (c. 500) bgan to appear.
12 hours

Saturday 19th of May
Corrected minor GUI and functionality issues.
Continued with threshold investigation.
Designed an interface for the web documentation, and began documentation.
3 hours

Monday 21st of May
Recorded much more data for testing and training purposes.
Previous trends that indicated a clearcut threshold, were no longer apparent.
Worked on extra functionality and presentation.
Continued Documentation.
12 hours