AddestorionVayanis on DeviantArthttps://www.deviantart.com/addestorionvayanis/art/MMM-MMD-Automatic-Lipsyncing-for-vocal-tracks-424527548AddestorionVayanis

Deviation Actions

AddestorionVayanis's avatar

MMM / MMD Automatic Lipsyncing for vocal tracks

Published:
67K Views

Description

This tutorial is a summary of what i have been doing for a side project for my patron.
It is how to obtain a decent enough MMD lipsync from an acapella (vocal only) track with as little effort on your end as possible.  The reason why i write this for facial data is because although Mr Mogg's VMDReductionTool program sites.google.com/site/moggproj… can be used to smooth out motions of bones (usually from kinect), it cannot be done for facial data.
Therefore, another method has to be used.

VIDEO VERSION - www.youtube.com/watch?v=ozKBYG…

Stuff Required:
MikuMikuMoving
Lipsyncloid plugin
A model which has at least the facials a, i, o and u.  The facial 'e' is not required.
VMDConveter2
A program which can open spreadsheets.  Microsoft Excel works for most cases.
A vocal only soundtrack.  This can be either a song vocal, or a recorded microphone track.  The file must be a .wav file encoded to 48kHz frequency at 16 bits.  I use Super by erightsoft www.erightsoft.com/SUPER.html to convert my audio files to this format.  Additional info: 1536 kb/s using WAV-sowt (PCM 16 little endian), 2 channel.

MikuMikuDance (optional.  i use it because all my effects files work in MMD, but not necessarily in MMM)

GETTING A LIPSYNC FROM AN ACAPELLA MUSIC FILE
1. Download MikuMikuMoving sites.google.com/site/moggproj…
2. Download Lipsyncloid by なヲタ (nawota1105 on bowlroll) - www.nicovideo.jp/watch/sm22506… I found it here - www6.atwiki.jp/vpvpwiki/pages/…
3. NOTE - when you download the files, before unzipping or unrarring it, right click the file, and open properties.  Then click unblock.  If your PMDeditor or MMD is running into a gazillion errors before starting saying plugins cannot load, (HRESULT: 0x80131515) the reason is because windows is blocking all .dll files as it deems them unsafe.  This fixes it.
4. Unzip your stuff.
5. Copy lipsyncloid.dll into the Plugins folder of MikuMikuMoving.

USING LIPSYNCLOID - 3:32
1. Open MikuMikuMoving.  The Lipsyncloid plugin should show up in the Plugins section of MikuMikuMoving
2. Load a model and the audio file.
3. Press Lipsyncloid.  This will convert the .wav file
4. If you encounter an error, it means that your .wav file isnt encoded to the format i told you to convert it to.  Fix it.
5. If MikuMikuDance enters an out of memory alert, it means your wav file is too long.  A safe number of frames which it can succesfully transcribe is 15,000.  Split up the remainder if required.  You may also need to shut down other programs as well. 
6.  You get a happy notification if its all okay.
7.  The motion is loaded onto the model.  It transcribes only for the facials a, i, o and u.  It does so for every single frame.
8.  If you are fine with the results you obtained, you may stop here.  Otherwise, keep reading.


CONVERTING THE MOTION INTO A SPREADSHEET FORMAT 6:36
1. You will probably note that since the facial changes every second, it is very shaky.  You could manually edit the values in MikuMikuDance/Moving, but this section teaches you how to fix some of that using a spreadsheet.
2. Select all the facial data in the Sequence tab.  Then using MikuMikuMoving > File > Export Motion > choose a folder, select VMD, choose only the a, i, o and u facials, then Export.
3b. Download Yumin3123's VMDconverter yumin3123.at.webry.info/200810…
or
3b. If the previous link did not work, searching bowlroll gets you a graphical version of the VMDconverter: bowlroll.net/file/13705

3a - VMDConverter
1. The VMDconverter is a program which converts VMD motion files to CSV spreadsheet files, and vice versa.  Microsoft excel can be used to open them.
2. To convert a motion to CSV, drag and drop the saved VMD file directly onto VMDConverter.exe.  Wait for it to finish.  It will close by itself once its done.
3. The converted file will appear in the same folder as the VMD file.

3b - VMDconvertorGraphical
1. This version is an exe file which can be opened to reveal a graphical menu.
2.  Use the program to open vmd files to convert them into csv files, and then you can use the same program to convert csv files back into vmd files.
3.  Open vmd or csv files, using the File drop down menu, and once the files are open, click on the button below and wait for it to process it. You'll get a notification when its done



[I'm going to be using Microsoft Excel from here on]

MAKING SENSE OF THE CSV FILE 10:40
1.  Open your .csv file.  It stands for 'comma separated values' spreadsheet.  So you can also open it with notepad and edit the values using it, but its a lot easier to open it with excel or a proper spreadsheet program.  Read more here - en.wikipedia.org/wiki/Comma-se…
2.  When you open the .csv file in excel, you may encounter an error, where you cant make sense of the data.  If you encounter this error, it means your excel is defaulting into english first.  To fix it, go to Excel > Excel Options > Popular > Language Settings > Primary language settings > Japanese (Japan)
3.  From now on, when you open the file, it should default into japanese first, and can see the right names and stuff.  It will ALWAYS use the japanese naming.
4.  Now that we have that settled, the first row is the description.  Its usually Vocaloid motion data0002.  Dont touch this.  The second one is the name of the model that the motion is for.  If you alter this, you'll just be alerted if the name of the model here and the name of the model dont match.  The third row is the number of effective frames.  There is one per facial per frame.  This number must be changed if the number of rows have been changed somehow.
5.  The first column from 4th row onwards is the facial name.  Second is the frame.  Third is the value.  For facials its 0 to 1.
6.  After the frames, are are usually two rows with zeroes after the very last frame.  When recounting how many hows for the amount of frames, do not count the rows with the zeroes.


OTHER TYPES OF CSV FILES
1.  If you exported a bone, the 4th, 5th and 6th row is X, Y and Z coordinates, 7th 8th and 9th row are X, Y and Z angle rotations and the 10th column is for the interpolation data (i dont recommend touching the 10th column).  For rotation values, you can have it go below -180 degrees and above 180 degrees, but MMD will automatically change it once you load the motion to something -180 to 180.  
2.  If you export camera motion, then the 1st column is the frame, and the remainder is XYZ and rotation XYZ.  I'm not sure what the next column does, but the final column with the long combo controls view angle and perspective mode.  Try not to mess with this while in the spreadsheet editor.  Safer to just copy and paste the values if you know what setting you want.  
3.  If you export Light motion, its frame, then the next 3 columns are RGB with ranges from 0 to 1 (the default 154 settings have a value of 0.602), and the remaining 3 columns are XYZ coordinates.
4.  Shadow does not work however.
5.  As usual, for all these other types of exported motion, there are some zeroes after the very last row.  Dont touch them.

REMOVING REDUNDANT FRAMES 12:42
1.  If you plan to use the facial data in MikuMikuDance instead of MikuMikuMoving, there is a 20000 facials limit, so you need to bring down your facial frames count.
2.  Easiest thing to do is to sort them out first.  Select ALL the useful frames, then go to Home> Sort&Filter>Custom Sort.  Sort by Column (choose ColumnA), Values, A to Z.  This will sort the data according to the facial name.  Dont worry, the ones on the right are sorted as well.
3.  The facials with absolutely zero values are easy to spot, so just delete the entire row.  However, make sure that there are two zeroes, one at the start and one at the end to keep the value as zero for the values in between those two frames.

REMOVING INTERMEDIATE FRAMES 14:37
1.  In order to remove the bulk of intermediate motions, we need to identify the peaks and valleys (maximum//minimum points in maths terms).  In order to identify these points, i use excel formulas.
2.  I use the function =IF(B7>B6, 1, 0) on a new column, and =IF(B7>B8, 1, 0) on another.  Then use a third column to add together the values on those two columns.  Any row with a maximum point will have a value of 2, and any row with a minimum value will have a value of 0.  Every other row will have a value of 1, which i recommend should be deleted.
3.  To retain the values, select those two rows, copy, and onto the same location (or top left hand side of your selected values), right click and 'paste special'.  Select 'values'.  This will re-paste the data as a value, rather than the formula used (the values will be changed if the position of the rows change).
4.  Use custom sort, sort by the column with 0s, 1s and 2s.  Then delete all the values with a value of 1.
Notes: this doesnt quite work on sound which has an alternating high/low (sinusoidal) pattern.  But if you encounter this, you can easily delete unwanted facial frames in MMD or MMM since you cleaned out most of the rest.

PREPARATION BEFORE REEXPORTING TO VMD 22:45
1.  Once youre done with your operations, delete any columns which you used for fancy maths operations.
2.  Count how many effective rows there are and change the value of the 'number of effective rows' value to the right amount.

REEXPORT TO VMD 22:58
1.  Save your .csv file.
2.  Make sure the folder you have your .csv file in does not have a .vmd file with the same name as it.  This usually happens because VMD> CSV in that folder, so your CSV>VMD will also generate the file in the same folder.  You'll either run into an overwrite error, or it wont work at all.  Rename your original motion just to keep it safe.  Just not the exact same name as the .csv file.
3.  Drag and drop the .csv file over vmdconverter.exe.
4.  Your new motion is created in the same folder.
5.  Drag and drop your motion onto a new model.  It should work. [note, if you failed to reduce the number of facials to below 20,000 points, it will still only register a maximum of 20,000 points]

EXTRA TWEAKING IN MMD/MMM 25:37
1.  Now that you can see your motion data, you can simply delete the remaining facials you dont need, and add facials you do need.
2.  Remember that for MMD, you have a 20000 facial points limit, which also needs to be shared with the other facials, so if you didnt reduce it below 20000, you will encounter an error saying 'unable to regist more than 20000 facial frames' and it will only load up to 20000 facial points.  For MikuMikuMoving, you can safely get 60,000 facial points.
3. Do note that you need to delete points which correspond to unintended loud sounds, breathing, nose sneezing, etc etc.

SUPER LONG AUDIO FILES
1.  To get lipsync for super long tracks (like the entire lipsync tutorial), split your audio track to several parts, and then perform a lipsync for every part.
2.  After you cleaned up everything, make sure you change the frame time so that the bones appear exactly when they should.
3.  You can then copy and paste the cleaned up data into a single .csv file and then convert after you calculate the number of effective rows.
4.  I managed to reduce approximately 240,000 points down to 50,861 points for the lipsync of the lipsync tutorial.

ADDITIONAL TIPS
1.  The audio works best when the peaks between the loud and soft parts of the video are more or less in the same range.  This may affect the output if say, there are very loud bursts of sound halfway through.  To suppress the problem, use 'Volume Leveling' in your audio editor, and then adjust the decibel volume of the track.  For this purpose, i usually use MediaMonkey to do my volume levelling, but they only work on mp3 files.  So i usually combine it with Super in order to get audio files that i do want.

2.  If you want to use this to transcribe english, i suggest using the following conversions so that the lip movements will roughly match english pronunciation.
a -> e え
e -> i い
i -> a あ
o -> o お
u -> u う


Potential follow up tutorials
- Cleaning out extremely long audio tracks ~30 mins
- how to use VMDconverter on bones
- How to use mathematics to control bones in excel
Image size
1272x761px 224.32 KB
© 2014 - 2024 AddestorionVayanis
Comments305
Join the community to add your comment. Already a deviant? Log In
patomite's avatar